Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Time Series in the Age of Large Models

MEDS-torch: An ML Pipleine for Inductive Experiments for EHR Medical Foundation Models

Nassim Oufattole · Teya Bergamaschi · Pawel Renc · Aleksia Kolo · Matthew McDermott · Collin Stultz


Abstract:

We introduce meds-torch, a scalable and extensible pipeline designed to process any medical dataset adhering to the MEDS format—a universal schema for medical time series. We systematically compare three tokenization methods (Everything In Code, Triplet, and Text Code) and evaluate five transfer learning techniques, including variations of autoregressive generative modeling and contrastive learning, across multiple predictive tasks on the MIMIC-IV EHR dataset. Our empirical analysis provides actionable insights into the effectiveness of each method, demonstrating that certain tokenization and pretraining combinations significantly outperform others. By benchmarking these approaches against fully supervised learning models, we offer practical recommendations for selecting appropriate modeling strategies in diverse healthcare settings. The meds-torch pipeline not only streamlines the application of these methods but also promotes reproducibility and standardization in EHR research, facilitating more effective machine learning applications in healthcare.

Chat is not available.