Poster
in
Workshop: Statistical Frontiers in LLMs and Foundation Models
Skilling laws: scaling laws for LLM benchmark performance
Felipe Maia Polo · Seamus Somerstep · Leshem Choshen · Yuekai Sun · Mikhail Yurochkin
Keywords: [ skills ] [ scaling laws ] [ benchmarks ] [ evaluation ] [ LLMs ]
In this work, we introduce a series of scaling laws capable of accurately predicting the performance of a larger model based on the performance of smaller models from the same family. This allows practitioners to make informed decisions about whether to scale up, such as training a 70B parameter model based on the results of an 8B parameter model. Our proposed class of scaling laws termed Skills Scaling Laws (SSLaws, pronounced as Sloth), utilizes data from various model families and benchmarks to provide more accurate and interpretable predictions. Sloth introduces a latent model ability vector, linking model performance across popular benchmarks, and is the first to incorporate the impact of instruction tuning on model performance. We explore the effects of increasing the number of families and benchmarks on the predictive accuracy of our scaling law, demonstrating its effectiveness by predicting the performance of larger models, such as LLaMA 3 70B, for all benchmarks in the Open LLM Leaderboard v1/v2 within 3.5 percentage points of the true performance on average.