Poster
in
Workshop: Learning from Time Series for Health
Identifying Structure in the MIMIC ICU Dataset
Qi Qi Chin
The MIMIC-III dataset, containing trajectories of 40,000 ICU patients, is one of the most popular datasets in machine learning for health space. However, there has been very little systematic exploration to understand what is the natural structure of these data---most analyses enforce some type of top-down clustering or embedding. We take a bottom-up approach, identifying consistent structures that are robust across a range of embedding choices. We identified two dominant structures sorted by either fraction-inspired oxygen or creatinine --- both of which were validated as the key features by our clinical co-author. Our bottom-up approach in studying the macro-structure of a dataset can also be adapted for other datasets.