NeurIPS Mitigating Lies in Vision-Language Models

Poster
in
Workshop: Workshop on Machine Learning Safety

Mitigating Lies in Vision-Language Models

Junbo Li · Xianhang Li · Cihang Xie

[ Abstract ]

Abstract:

In this work, we bring new insights into the honesty of vision-language models,particularly in visual question answering (VQA). After a throughout revisit of theexisting ‘lie’ behavior in pure language models, our work makes an unprecedentedextension of ’lies’ to vision-language models. The results indicate that the lieprefixes have a more obvious misleading effect on vision-language models thanon language models. We also propose a novel visual prefix and prove that theconsistent vision-language prefix is more threatening to vision-language models.To defend the models from the stated ’lies’, we put forward an unsupervisedframework based on Gaussian mixture modeling and obtain improvement with 3%against the language prefix and 12% against the vision-language prefix.

Chat is not available.

Poster in Workshop: Workshop on Machine Learning Safety

Mitigating Lies in Vision-Language Models

Junbo Li · Xianhang Li · Cihang Xie

Poster
in
Workshop: Workshop on Machine Learning Safety