Poster
in
Affinity Event: Black in AI
Can Large Language Models Provide Emergency Medical Help Where There Is No Ambulance? A Comparative Study on Large Language Model Understanding of Emergency Medical Scenarios in Resource-Constrained Settings
Paulina Boadiwaa Mensah · Nana Seraa Quao · Sesinam Dagadu · Project Genie
There are a few medicine-oriented evaluation datasets and benchmarks for assessing the performance of various LLMs in clinical scenarios; however, there is a paucity of information on the real-world utility of LLMs in context-specific scenarios in resource-constrained settings. In this study, 16 iterations of a decision support tool for medical emergencies using 4 distinct generalized LLMs were constructed, alongside a combination of 4 Prompt Engineering techniques. In total 428 model responses were quantitatively and qualitatively evaluated by 22 clinicians familiar with the medical scenarios and background contexts. The best model-technique pair had a mean rating of 8.05/10. Our study highlights the benefits of In-Context Learning with few-shot prompting, and the utility of the relatively novel self-questioning prompting technique. We demonstrate the benefits of combining various prompting techniques to elicit the best performance of LLMs. We also highlight the need for continuous human expert verification in the development and deployment of LLM-based health applications, especially in use cases where context is paramount.