Poster
in
Workshop: Safe Generative AI
EchoQA: A Large Collection of Instruction Tuning Data for Echocardiogram Reports
Lama Moukheiber · Mira Moukheiber · Dana Moukheiber · Jae-Woo Ju · Hyung-Chul Lee
We introduce a novel and extensive question-answering (QA) dataset using echocardiogram reports sourced from the Medical Information Mart for Intensive Care (MIMIC) data. This dataset is specifically designed to enhance QA systems in cardiology, consisting of 765,605 QA pairs addressing a wide array of cardiac abnormalities and their severity. We compare various large language models (LLMs), including both open-source general models and biomedical-specific models, alongside state-of-the-art closed-source models for zero-shot evaluation. Our results show that fine-tuning LLMs improves performance across various question answering metrics, highlighting the validity and value of our dataset. Further, we conduct fine-grained fairness audits to assess the bias-performance tradeoff of LLMs across marginalized populations. Our objective is to propel the field forward by establishing a benchmark framework for developing LLM AI agents that support clinicians in their daily workflow within the cardiology space. The availability of this dataset aims to support the advancement of natural language models for use in diagnostic decision support systems, aiming to increase efficiency in cardiology care.