Skip to yearly menu bar Skip to main content



Abstract:

Researchers have traditionally run regressions on numerical and categorical data to detect police bias and inform decisions about criminal justice. This approach can only control for a limited set of simple features, leaving significant unexplained variation and raising concerns of omitted variable bias. Using a novel dataset of text from more than a million police stops, we propose a new method applying large language models (LLMs) to incorporate textual data into regression analysis of stop outcomes. Our LLM-boosted approach has considerably more explanatory power than traditional methods and substantially changes inferences about police bias on characteristics like gender, race, and ethnicity. It also allows us to investigate what features of police reports best predict stops and how officers differ in their conduct of stops. Incorporating textual data ultimately permits more accurate and more detailed inferences on criminal justice data.

Chat is not available.