Spotlight
in
Workshop: Algorithmic Fairness through the lens of Metrics and Evaluation
Fair Summarization: Bridging Quality and Diversity in Extractive Summaries
Sina Bagheri Nezhad · Sayan Bandyapadhyay · Ameeta Agrawal
Keywords: [ NLP ]
Sat 14 Dec 9 a.m. PST — 5:30 p.m. PST
Fairness in multi-document summarization of user-generated content remains a critical challenge in natural language processing (NLP). Existing summarization methods often fail to ensure equitable representation across different social groups, leading to biased outputs. In this paper, we introduce two novel methods for fair extractive summarization: FairExtract, a clustering-based approach, and FairGPT, which leverages GPT-3.5-turbo with fairness constraints. Both models ensure fairness by balancing the representation of social groups such as White-aligned, Hispanic, and African-American dialects. We evaluate these methods using a dataset of tweets and compare them against relevant baselines using a comprehensive set of summarization quality metrics such as SUPERT, BLANC, SummaQA, BARTScore, and UniEval, as well as a fairness metric F. Our results demonstrate that FairExtract and FairGPT achieve superior fairness while maintaining competitive summarization quality. Additionally, we introduce composite metrics (e.g., SUPERT+F, BLANC+F) that integrate quality and fairness into a single evaluation framework, offering a more nuanced understanding of the trade-offs between these objectives. This work highlights the importance of fairness in summarization and sets a benchmark for future research in fairness-aware NLP models.