Spotlight
in
Workshop: Algorithmic Fairness through the lens of Metrics and Evaluation
Benchmark to Audit LLM Generated Clinical Notes for Disparities Arising from Biases and Stereotypes
Hongyu Cai · Swetasudha Panda · Naveen Jafer Nizar · Qinlan Shen · Daeja Oxendine · Sumana Srivatsa · Krishnaram Kenthapadi
Keywords: [ Audits ] [ Evaluation Metrics and Techniques ] [ Bias Detection ] [ Data collection and curation ]
Sat 14 Dec 9 a.m. PST — 5:30 p.m. PST
After each patient encounter, physicians compile extensive, semi-structured clinical summaries known as SOAP notes. These notes, while essential for both clinical practice and research, are time-consuming to generate in a digital format, contributing significantly to physician burnout. Recently, large language models (LLMs) have shown promising abilities in automating the generation of SOAP notes. Despite these advancements, there is a risk that such models could inadvertently cause harm and worsen existing health disparities. It is crucial to systematically evaluate model failures related to equity to ensure the development of clinical documentation tools that uphold principles of health equity. This study introduces a benchmark dataset and proposes methodologies for assessing equity-related harms in LLM-generated, long-form SOAP notes. Our work aims to establish a foundation for ensuring that automated clinical documentation tools are not only efficient but also equitable in their impact on diverse patient populations.