NeurIPS LLAVAGUARD: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment

Oral
in
Workshop: Workshop on Responsibly Building Next Generation of Multimodal Foundation Models

LLAVAGUARD: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment

Lukas Helff · Felix Friedrich · Manuel Brack · Kristian Kersting · Patrick Schramowski

Keywords: [ Model safeguarding ] [ Dataset curation ] [ Safety ] [ Safeguarding ] [ VLM ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Sat 14 Dec 11:15 a.m. PST — 11:30 a.m. PST

Abstract:

We introduce LlavaGuard, a family of VLM-based safeguard models, offering a versatile framework for evaluating the safety compliance of visual content. Specifically, we designed LlavaGuard for dataset annotation and generative model safeguarding. To this end, we collected and annotated a high-quality visual dataset incorporating a broad safety taxonomy, which we use to tune VLMs on context-aware safety risks. As a key innovation, LlavaGuard’s responses contain comprehensive information, including a safety rating, the violated safety categories, and an in-depth rationale. Further, our introduced customizable taxonomy categories enable the context-specific alignment of LlavaGuard to various scenarios. Our experiments highlight the capabilities of LlavaGuard in complex and real-world applications.We provide checkpoints ranging from 7B to 34B parameters demonstrating state-of-the-art performance, with even the smallest models outperforming baselines like GPT-4. We make our dataset and model weights publicly available and invite further research to address the diverse needs of communities and contexts.

Chat is not available.

Oral in Workshop: Workshop on Responsibly Building Next Generation of Multimodal Foundation Models

LLAVAGUARD: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment

Lukas Helff · Felix Friedrich · Manuel Brack · Kristian Kersting · Patrick Schramowski

Oral
in
Workshop: Workshop on Responsibly Building Next Generation of Multimodal Foundation Models