Poster
in
Workshop: System-2 Reasoning at Scale
Sampling Language from Latent System 2 Reasoning
Celine Lee · Md Arafat Sultan · Tahira Naseem · Alexander Rush · Ramón Astudillo
Modern language modeling datasets require models to handle system-2 compositional reasoning, fact recall, and task-specific constraints. While these tasks are expressed in natural language, they often imply an underlying smbolic representation. In this work, we consider methods for extracting a latent symbolic representation in an unsupervised manner.We introduce a latent variable modeling approach that models observed data as being generated by from a latent generative representation: an executable code program. Code as the latent symbolic representation offers two key advantages.First, code offers a structured space that can be explored via modular functions; second, code is interpretably executable using deterministic and neural interpreters, enabling compositional and programmatic decoding into text. By identifying and composing patterns in this latent space, we can sample programs that produce correct, diverse, and task-relevant text through program execution.We demonstrate how our method induces a latent space with modern LLMs, explore patterns discovered within it, and evaluate text data synthesized from our induced latent space.