Poster
in
Workshop: Synthetic Data Generation with Generative AI
Knowledge-Infused Prompting Improves Clinical Text Generation with Large Language Models
Ran Xu · Hejie Cui · Yue Yu · Xuan Kan · Wenqi Shi · Yuchen Zhuang · Wei Jin · Joyce Ho · Carl Yang
Keywords: [ prompting ] [ Synthetic Data Generation ] [ Clinical NLP ] [ Large language models ]
Clinical natural language processing requires methods that can address domain-specific challenges, such as complex medical terminology and clinical contexts. Recently, large language models (LLMs) have shown promise in this domain. Yet, their direct deployment can lead to privacy issues and are constrained by resources. To address this challenge, we propose ClinGen, which infuses knowledge into synthetic clinical text generation using LLMs for clinical NLP tasks. Our model involves clinical knowledge extraction and context-informed LLM prompting. Both clinical topics and writing styles are drawn from external domain-specific knowledge graphs and LLMs to guide data generation. Extensive studies across 7 clinical NLP tasks and 16 datasets reveal that ClinGen consistently enhances performance across various tasks, effectively aligning the distribution of real datasets and enriching the diversity of generated training instances.