Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Medical Imaging meets NeurIPS

Towards Generalist Models for Multimodal Clinical Diagnostics

Yunxiang Fu · Hong-Yu Zhou · Yizhou Yu


Abstract:

We introduce MMCaD, the first multimodal dataset for general clinical diagnostics, consisting of nearly 60k real-world cases and one thousand health problems. Alongside MMCaD, we present GeMini, a multimodal transformer designed for clinical diagnostics. GeMini decouples the decision-making process into modality-specific encoding and modality-agnostic decoding, optimizing both stages jointly. Experimental results demonstrate that GeMini outperforms existing counterparts in digital medicine and computer vision, sometimes by up to 6%. Moreover, GeMini does not need pre-trained weights for decoding, allowing a more flexible architecture design.

Chat is not available.