Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Table Representation Learning Workshop (TRL)

From One to Zero: RAG-IM Adapts Language Models for Interpretable Zero-Shot Predictions on Clinical Tabular Data

Sazan Mahbub · Caleb Ellington · Sina Alinejad · Kevin Wen · Yingtao Luo · Ben Lengerich · Eric Xing

Keywords: [ machine learning on clinical tabular data ] [ pre-trained language models ] [ interpretable models ] [ retrieval-augmented generation ]


Abstract:

Clinical machine learning models, often learned from tabular data, must adapt to new settings such as different hospitals, clinicians, or patient populations. These differing environments present related but subtly distinct tasks, where diseases and medical interventions share common foundations but vary in meaningful ways. In contrast to one-size-fits-all invariant feature learning, we believe representing meaningful differences between domains and adapting to these differences will improve accuracy, utility, and interpretability of machine learning in health. Here, we introduce Retrieval-Augmented Generation of Interpretable Models (RAG-IM), a highly performant method for adapting statistical models that are trained on tabular data to new domains based on their descriptions. By leveraging the strengths of Retrieval-Augmented Generation (RAG), our framework retrieves relevant models from related tasks and combines them with contextual insights from pre-trained language models. RAG-IM generates task-specific, interpretable models that perform reliably, even in few-shot and zero-shot scenarios where data are limited or completely unavailable. Through experiments on 7487 related tasks, we find that RAG-IM is a promising general-purpose platform to enable model-based analysis to data-limited and heterogeneous regimes by connecting statistical analysis with natural language.

Chat is not available.