Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Time Series in the Age of Large Models

Towards Large-scale Clinical Multi-variate Time-series Datasets

Manuel Burger · Fedor Sergeev · Malte Londschien · Daphné Chopard · Hugo Yèche · Eike Gerdes · Polina Leshetkina · Alexander Morgenroth · Zeynep Babür · Jasmina Bogojeska · Martin Faltys · Rita Kuznetsova · Gunnar Rätsch


Abstract:

Notable progress has been made in generalist medical Large Language Models (LLMs) across various healthcare areas. However, large-scale modeling of in-hospital time series data—such as vital signs, lab results, and treatments in Intensive Care Units (ICUs)—remains underexplored. Existing ICU datasets are relatively small, but combining them can enhance patient diversity and improve model robustness. To generalize across hospitals, models must also address distribution shifts caused by varying treatment policies, which requires harmonization of treatment variables across datasets. This work aims to establish a foundation for training large-scale multi-variate time series models on critical care data and to provide a benchmark for machine learning models in transfer learning across hospitals to study and address distribution shift challenges. We introduce a harmonized dataset for research in sequence modeling and transfer learning, representing the first large-scale collection to include core treatment variables. Future plans involve expanding this dataset to further support advancements in transfer learning and the development of scalable, generalizable models for critical healthcare applications.

Chat is not available.