Poster
in
Workshop: Workshop on Open-World Agents: Synnergizing Reasoning and Decision-Making in Open-World Environments (OWA-2024)
HSCL-RL: Mitigating Hallucinations in Multimodal Large Language Models
Zichen Song · 思潭 黄
Keywords: [ Multimodal Large Language Model ] [ Open World ] [ Hallucination ] [ Reinforcement Learning ] [ Contrastive Learning ]
Multimodal large language models (MLLMs) have shown excellent performance in tasks that combine natural language and visual information. However, they still suffer from hallucinations, where they generate incorrect or false information, especially in open-world environments. This study proposes a method that combines reinforcement learning and contrastive learning to alleviate the hallucination problem in MLLMs. By introducing Hallucination-Augmented Contrastive Learning (HSCL), we utilize false text as hard negative samples to strengthen the alignment between visual and textual representations. Additionally, within a reinforcement learning framework, we dynamically adjust the model in open-world environments to further reduce hallucinations. Experimental results demonstrate that the proposed method effectively reduces hallucination rates across multiple benchmark datasets and significantly improves overall model performance.