Poster
in
Workshop: Evaluating Evaluations: Examining Best Practices for Measuring Broader Impacts of Generative AI
Fairness Dynamics During Training
Krishna Patel · Nivedha Sivakumar · Barry-John Theobald · Luca Zappella · Nicholas Apostoloff
Keywords: [ fairness ] [ training dynamics ] [ bias ] [ gender bias ] [ LLMs ] [ early stopping ]
Understanding fairness dynamics during Large Language Model (LLM) training facilitates the diagnoses of biases that emerge and enables developers to mitigate biases through early stopping or other training interventions. We introduce two new metrics to evaluate fairness dynamics holistically during model pre-training: Average Rank and Jensen-Shannon Divergence by Parts. These metrics provide insights into the Pythia models' progression of biases in gender prediction of occupations on the WinoBias dataset. We find that Pythia-6.9b becomes more performant and confident predicting "male" than "female" during training. By monitoring these dynamics, we find that, via early-stopping, Pythia-6.9b can exchange 1.7% accuracy on LAMBADA for a 92.5% increase in fairness. We also find that Pythia-6.9b is more likely than Pythia-160m to exhibit bias and make assumptions about gender, even when a subject's gender is not specified.