Poster
in
Workshop: Statistical Frontiers in LLMs and Foundation Models
Distribution-based sensitivity analysis for large language models
Paulius Rauba · Qiyao Wei · Mihaela van der Schaar
Keywords: [ large language models ] [ LLM auditing ] [ sensitivity analysis ] [ frequentist statistics ]
We introduce Distribution-based Sensitivity Analysis (DBSA), a novel framework that reformulates Large Language Model (LLM) sensitivity analysis as a frequentist hypothesis testing problem, enabling statistical inference without imposing distributional constraints on the model. Traditionally, the fundamental challenge of testing LLM sensitivity has been in distinguishing meaningful changes in model outputs caused by input perturbations from the inherent stochasticity of LLM responses. DBSA addresses this by constructing and comparing empirical output distributions in a low-dimensional similarity space, allowing for a comprehensive yet computationally tractable assessment of how input perturbations affect the entire range of possible LLM responses. Our model-agnostic framework enables the evaluation of arbitrary input perturbations on any black-box LLM, yielding interpretable p-values and effect sizes for comprehensive model auditing and robustness assessment. We empirically show how DBSA can be used to audit LLM's sensitivity to input perturbations as well as be used for LLM robustness evaluation across several settings. Fundamentally, this work provides practitioners with a powerful framework for auditing LLM response changes for any input perturbations.