Poster
in
Workshop: Medical Imaging meets NeurIPS
M3-X: Multimodal Generative Model for Screening Mammogram Reading and Explanation
Man Luo · Amara Tariq · Bhavik Patel · Imon Banerjee
FDA (The United States Food and Drug Administration) approved multiple automated mammogram image reading models, but most of the models lack interpretability. Efforts have been made to interpret the model's decision through saliency maps or GradCAMs~\cite{selvaraju2017grad} that highlight the model’s attention on specific areas with the image. While technically sounds, these interpretability maps may not be well perceived by radiologists due to ambiguity and uncertainty of the findings. As such, we hypothesize that in addition to deriving the diagnosis, a text-based semantic explanation of a model’s attention (similar to findings documented in radiology reports) may be more readily understandable by humans and therefore may serve as a better trust-able component of an AI model. Therefore, the purpose of our study was to develop a transformer-based multi-modal generative model for the automatic interpretation of screening mammogram studies and the generation of text-based reasoning. Experimental results from our tests using the X-Institution\footnote{For anonymity concern, we will use X-Institution to not reveal the identity.} mammogram screening dataset demonstrate that our model significantly outperforms the baselines in both accuracy and the quality of explanations.