NeurIPS HateXplain Space Model: Fusing Robustness with Explainability in Hate Speech Analysis

Poster
in
Workshop: Third Workshop on Efficient Natural Language and Speech Processing (ENLSP-III): Towards the Future of Large Language Models and their Emerging Descendants

HateXplain Space Model: Fusing Robustness with Explainability in Hate Speech Analysis

Md Fahim · Md Shihab Shahriar · Mohammad Ruhul Amin

[ Abstract ]

Abstract:

In the realm of Natural Language Processing, Language Models (LMs) excel in various tasks but face challenges in identifying hate contexts while considering zero-shot or transfer learning issues. To address this, we introduce Space Modeling (SM), a novel approach that enhances hate context detection by generating word-level attribution and bias scores. These scores provide intuitive insights into model predictions and aid in the recognition of hateful terms. Our experiments across six hatespeech datasets reveal SM's superiority over existing methods, marking a significant advancement in refining LM-based hate context detection.

Chat is not available.

Poster in Workshop: Third Workshop on Efficient Natural Language and Speech Processing (ENLSP-III): Towards the Future of Large Language Models and their Emerging Descendants

HateXplain Space Model: Fusing Robustness with Explainability in Hate Speech Analysis

Md Fahim · Md Shihab Shahriar · Mohammad Ruhul Amin

Poster
in
Workshop: Third Workshop on Efficient Natural Language and Speech Processing (ENLSP-III): Towards the Future of Large Language Models and their Emerging Descendants