NeurIPS Towards Exploring Continual Fine-Tuning for Enhancing Language Ability in Large Language Model

Poster
in
Workshop: Fine-Tuning in Modern Machine Learning: Principles and Scalability

Towards Exploring Continual Fine-Tuning for Enhancing Language Ability in Large Language Model

Divyanshu Aggarwal · Sankarshan Damle · Navin Goyal · Satya Lokam · Sunayana Sitaram

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

A common challenge towards adaptability of Large Language Models (LLMs) is their ability to learn new languages over time without hampering the model's performance on languages in which the model is already proficient (usually English). Continual fine-tuning (CFT) is the process of sequentially fine-tuning an LLM so as to enable the model to adapt to downstream tasks with varying data distributions and time shifts. In this paper, we focus on language adaptability of LLMs through CFT. Concretely, we study a two-phase CFT process in which an English-only end-to-end fine-tuned LLM from Phase 1 (predominantly Task Ability) is sequentially fine-tuned on a multilingual dataset -- comprising task data in new languages -- in Phase 2 (predominantly Language Ability). We observe that the ``similarity" of Phase 2 tasks with Phase 1 determines the LLM's adaptability. That is, for similar phase-wise datasets, the LLM after Phase 2 does not show deterioration in task ability. In contrast, when the phase-wise datasets are not similar, the LLM shows significant deterioration in task ability. We test our hypothesis on the open-source \textsc{Mistral-7B} and \textsc{LLaMA-3-8B} models with multiple phase-wise dataset pairs. To overcome the performance deterioration, we propose two methods, based on layer freezing and generative replay. We show that our methods assist the LLM in improving its language ability while preserving its task ability.

Chat is not available.

Poster in Workshop: Fine-Tuning in Modern Machine Learning: Principles and Scalability

Towards Exploring Continual Fine-Tuning for Enhancing Language Ability in Large Language Model

Divyanshu Aggarwal · Sankarshan Damle · Navin Goyal · Satya Lokam · Sunayana Sitaram

Poster
in
Workshop: Fine-Tuning in Modern Machine Learning: Principles and Scalability