NeurIPS Optimizing Fine-Tuning Efficiency: Gradient Subspace Tracking on Grassmann Manifolds for Large Language Models

Poster
in
Workshop: Mathematics of Modern Machine Learning (M3L)

Optimizing Fine-Tuning Efficiency: Gradient Subspace Tracking on Grassmann Manifolds for Large Language Models

Sahar Rajabi · Sirisha Rambhatla

Keywords: [ large language models ] [ optimization ] [ efficient fine-tuning ] [ scaling LLMs ] [ subspace tracking ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Training and fine-tuning Large Language Models (LLMs) demand significant computational resources and time due to their large model sizes and optimizer states. To mitigate these challenges and improve accessibility, several memory-efficient methods have been developed. Methods such as Low-Rank Adaptation (LoRA) optimize model weights within a low-rank subspace, while Gradient Low-Rank Projection (GaLore) projects gradients into a lower-dimensional space to decrease memory footprint. In this paper, we propose Gradient Subspace Tracking (SubTrack), a method that confines optimization to a compact core subspace of the gradient matrices and dynamically tracks its changes using the geometry of Grassmannian manifolds. SubTrack efficiently updates its subspace estimation by leveraging estimation errors and previously identified subspaces. Our results demonstrate that even with rank-1 updates to the underlying subspace, SubTrack achieves comparable or superior performance to GaLore, while reducing runtime by approx. 15% on an average and up to 20.56% on some datasets.

Chat is not available.

Poster in Workshop: Mathematics of Modern Machine Learning (M3L)

Optimizing Fine-Tuning Efficiency: Gradient Subspace Tracking on Grassmann Manifolds for Large Language Models

Sahar Rajabi · Sirisha Rambhatla

Poster
in
Workshop: Mathematics of Modern Machine Learning (M3L)