Poster
in
Workshop: Workshop on Machine Learning Safety
Mitigating Lies in Vision-Language Models
Junbo Li · Xianhang Li · Cihang Xie
In this work, we bring new insights into the honesty of vision-language models,particularly in visual question answering (VQA). After a throughout revisit of theexisting ‘lie’ behavior in pure language models, our work makes an unprecedentedextension of ’lies’ to vision-language models. The results indicate that the lieprefixes have a more obvious misleading effect on vision-language models thanon language models. We also propose a novel visual prefix and prove that theconsistent vision-language prefix is more threatening to vision-language models.To defend the models from the stated ’lies’, we put forward an unsupervisedframework based on Gaussian mixture modeling and obtain improvement with 3%against the language prefix and 12% against the vision-language prefix.