Poster
in
Workshop: Causality and Large Models
Are Police Biased? An NLP Approach
Jonathan Choi
Keywords: [ regression analysis ] [ racial profiling ] [ Text Analysis ] [ police bias ] [ law enforcement ] [ pedestrian stops ] [ empirical legal studies ] [ contraband discovery ] [ racial discrimination ] [ predictive modeling ] [ omitted variable bias ] [ criminal justice ] [ Large language models ] [ machine learning ] [ Natural Language Processing ]
Researchers have traditionally run regressions on numerical and categorical data to detect police bias and inform decisions about criminal justice. This approach can only control for a limited set of simple features, leaving significant unexplained variation and raising concerns of omitted variable bias. Using a novel dataset of text from more than a million police stops, we propose a new method applying large language models (LLMs) to incorporate textual data into regression analysis of stop outcomes. Our LLM-boosted approach has considerably more explanatory power than traditional methods and substantially changes inferences about police bias on characteristics like gender, race, and ethnicity. It also allows us to investigate what features of police reports best predict stops and how officers differ in their conduct of stops. Incorporating textual data ultimately permits more accurate and more detailed inferences on criminal justice data.