Poster
in
Workshop: Foundation Model Interventions
Overcoming Limitations of Steering Vectors with Low-Rank Representation Steering
Dmitrii Krasheninnikov · David Krueger
Keywords: [ representation engineering ] [ steering vectors ] [ Activation steering ] [ controlled generation ]
This paper studies the limitations of steering vector methods for controlling neural network outputs, and introduces Low-rank Representation Steering (LoReSt) as a more effective alternative. We use a toy multi-label classification setup to systematically evaluate steering methods across different task complexities. Key contributions include: (1) a clear example showing how existing methods that rely on translation by a fixed vector can be insufficient for model steering, (2) the introduction of LoReSt, which instead steers by adding a vector that linearly depends on source activations, and (3) ablations showing that LoReSt outperforms steering vectors in constrained activation spaces and when steering requires more complex transformations, but is less data-efficient for easy steering tasks. We also find that layer normalization significantly benefits both LoReSt and steering vector methods. We conclude by discussing this work's weaknesses, which include our setup only modeling categorical features, and the lack of experiments with LLMs.