Skip to yearly menu bar Skip to main content


Poster
in
Workshop: AIM-FM: Advancements In Medical Foundation Models: Explainability, Robustness, Security, and Beyond

Best of Both Worlds: Harmonizing LLM Capabilities in Decision-Making and Question-Answering for Treatment Regimes

Hongxuan Liu · Zhiyao Luo · Tingting Zhu


Abstract:

This paper introduces a framework that incorporates fine-tuning large language models (LLM) with reinforcement learning (RL) in the application of the dynamic treatment regime (DTR). Within the RL training framework, our bilevel-LLM framework makes use of indications from the DTR environment for `RL with Environment Feedback' (RLEF) fine-tuning to achieve best-of-both-world results. Experimental results show that LLM-RLEF agent outperforms both existing RL policies and pure LLM policies on the \emph{SimGlucoseEnv} treatment regime task, improving sampling efficiency, generalizability, and interpretability. In addition to improving DTR performance, RLEF improves LLM's question-answering ability on the MMLU-Med, MedQA, and MedMCQA benchmarks.

Chat is not available.