Skip to yearly menu bar Skip to main content


Poster
in
Workshop: GenAI for Health: Potential, Trust and Policy Compliance

Enhancing Medical VQA with Multimodal Determination Rationales

Xiaotang Gai · Chenyi Zhou · Jiaxiang Liu · YANG FENG · Jian Wu · Zuozhu Liu

Keywords: [ Medical visual question answering; Medical decision-making rationales ]


Abstract:

Medical Visual Question Answering (MedVQA), which offers language responses to image-based medical inquiries, represents a challenging task and significant advancement in healthcare. It assists medical experts to swiftly interpret medical images, thereby enabling faster and more accurate diagnoses. However, the model interpretability and transparency of existing MedVQA solutions are often limited, posing challenges in understanding their decision-making processes. To address this issue, we devise a semi-automated annotation process to streamline data preparation and build new benchmark MedVQA datasets R-RAD, R-SLAKE and R-Path. These datasets provide intermediate medical decision-making rationales generated by multimodal large language models and human annotations for question-answering pairs in existing MedVQA datasets, i.e., VQA-RAD, SLAKE and PathVQA. Moreover, we design a novel framework, MedThink, which finetunes lightweight pretrained generative models by incorporating medical decision-making rationales. MedThink includes three distinct strategies to generate decision outcomes and corresponding rationales, thereby clearly showcasing the medical decision-making process during reasoning. Our comprehensive experiments show that our method achieves an accuracy of 83.5\% on R-RAD, 86.3\% on R-SLAKE and 87.2\% on R-Path. These results significantly exceed those of existing state-of-the-art models with comparable parameters. Datasets and code will be released.

Chat is not available.