Poster
in
Workshop: Attributing Model Behavior at Scale (ATTRIB)
Small-to-Large Generalization: Training Data Influences Models Consistently Across Scale
Alaa Khaddaj · Logan Engstrom · Aleksander Madry
Choice of training data distribution greatly affects model behavior. Yet, inlarge-scale settings, precisely characterizing how changes in trainingdata influence predictions is often difficult due to model training costs.Current practice is to instead extrapolate from scaled down,inexpensive-to-train proxy models. However, changes in data do not influencesmaller and larger models identically. Therefore, understanding how choice ofdata affects large-scale models raises the question: how does training datainfluence model behavior across compute scale? We find that the answer isnuanced. Small- and large-scale language model predictions generally dohighly correlate across choice of training data---often, even when small-modelpredictions are the level of random guessing. However, there also existtraining datasets for these predictions correlate much less. Equipped with thesefindings, we characterize how proxy scale affects performance in two downstreamproxy model applications: data attribution and dataset selection.