Poster
in
Workshop: Foundation Model Interventions
Linearly Controlled Language Generation with Performative Guarantees
Emily Cheng · Marco Baroni · Carmen Amo Alonso
Keywords: [ optimal control ] [ large language models ]
With increased use of Large Language Models (LMs) comes a need for controlled text generation strategies with performance guarantees. To achieve this, we use a common model of concept semantics as linearly represented in an LM's latent space. We take the view that each natural language token generation traces a trajectory in this continuous space, realized by the LM's hidden layer activations. This view permits a control-theoretic treatment of text generation in latent space, where we propose a lightweight, gradient-free intervention that is guaranteed (in-probability) to steer trajectories away from regions corresponding to undesired meanings. We demonstrate on toxicity and negativity use cases that the intervention steers language away from undesired content while maintaining text quality.