Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Third Workshop on Efficient Natural Language and Speech Processing (ENLSP-III): Towards the Future of Large Language Models and their Emerging Descendants

Model Fusion through Bayesian Optimization in Language Model Fine-Tuning

Chaeyun Jang · Jungtaek Kim · Hyungi Lee · Juho Lee


Abstract:

Fine-tuning a pretrained model for downstream tasks is a widely-adopted technique, which is known for its adaptability and reliability across various domains. Despite its conceptual simplicity, fine-tuning entails several engineering choices such as the selection of hyperparameters and the determination of checkpoints from an optimization trajectory. To tackle the difficulty of choosing the best model among multiple ones obtained from those choices, one of the effective solutions is model fusion, which combines multiple models on a parameter space. On the other hand, we observe a large discrepancy between loss and actual metric values where a loss is often used to pick out models to fuse. While the loss is generally differentiable and thus easier to optimize, the consideration of metrics is often a preferable goal to improve model performance. In response, we present a novel model fusion technique, optimizing a desired metric as well as a loss using \gls{bo}. Moreover, combining the multi-objective \gls{bo} into model fusion, we devise a bilevel framework, composed of \gls{bo} models for hyperparameter optimization and model fusion. Experiments across various downstream tasks validate decent performance improvements achieved using our \gls{bo}-based model fusion method.

Chat is not available.