Poster
in
Workshop: Foundation Models for Science: Progress, Opportunities, and Challenges
Specialized Foundation Models Struggle to Beat Traditional Supervised Learning Baselines
Ritvik Gupta · Zongzhe Xu · Wenduo Cheng · Alexander Shen · Junhong Shen · Ameet Talwalkar · Misha Khodak
Keywords: [ foundation models ] [ autoregressive models ] [ hyperparameter optimization ] [ supervised learning ] [ genomics ] [ satellite imaging ] [ convolutional networks ] [ neural architecture search ] [ time series ]
Following its success for vision and text, the ``foundation model'' (FM) paradigm---pretraining large models on massive datasets and fine-tuning them on target tasks---has been rapidly extended to domains in the sciences, engineering, healthcare, and beyond. While each successive model demonstrates improvement over its predecessors, what is less clear is whether this progress has achieved what FMs in vision and text accomplished: the supplanting of traditional supervised learning. We present a comprehensive investigation across three modalities---genomic sequencing, satellite imagery, and time series---where multiple FMs have been developed, with a focus on careful comparison with a standard supervised learning workflow: architecture development, hyperparameter optimization, and model training, all using only data from the target task. Our study of over thirty tasks shows that it is consistently possible to train a reasonably simple model---no more complicated than a lightly modified wide ResNet or UNet---that outperforms specialized FMs in these domains. Specifically, we attain state-of-the-art performance on the Nucleotide Transformer benchmark in genomics, compete with the latest pretrained Transformers for satellite imaging, and use an auto-regressive model to outperform all except one evaluated FMs on a suite of time series tasks. Our work demonstrates that the benefits of large-scale pretraining have yet to be fully realized in many specialized domains, reinforces the need to compare with strong, well-tuned baselines when evaluating new FMs, and introduces an easy-to-use, open-source, and automated workflow for doing this comparison.