Poster
in
Workshop: Medical Imaging meets NeurIPS
Towards Generalist Models for Multimodal Clinical Diagnostics
Yunxiang Fu · Hong-Yu Zhou · Yizhou Yu
Abstract:
We introduce MMCaD, the first multimodal dataset for general clinical diagnostics, consisting of nearly 60k real-world cases and one thousand health problems. Alongside MMCaD, we present GeMini, a multimodal transformer designed for clinical diagnostics. GeMini decouples the decision-making process into modality-specific encoding and modality-agnostic decoding, optimizing both stages jointly. Experimental results demonstrate that GeMini outperforms existing counterparts in digital medicine and computer vision, sometimes by up to 6%. Moreover, GeMini does not need pre-trained weights for decoding, allowing a more flexible architecture design.
Chat is not available.