Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Foundation Model Interventions

Linearly Controlled Language Generation with Performative Guarantees

Emily Cheng · Marco Baroni · Carmen Amo Alonso

Keywords: [ optimal control ] [ large language models ]


Abstract:

With increased use of Large Language Models (LMs) comes a need for controlled text generation strategies with performance guarantees. To achieve this, we use a common model of concept semantics as linearly represented in an LM's latent space. We take the view that each natural language token generation traces a trajectory in this continuous space, realized by the LM's hidden layer activations. This view permits a control-theoretic treatment of text generation in latent space, where we propose a lightweight, gradient-free intervention that is guaranteed (in-probability) to steer trajectories away from regions corresponding to undesired meanings. We demonstrate on toxicity and negativity use cases that the intervention steers language away from undesired content while maintaining text quality.

Chat is not available.