NeurIPS Constraint-Based Synthetic Data Generation for LLM Mathematical Reasoning

Poster
in
Workshop: MATH-AI: The 4th Workshop on Mathematical Reasoning and AI

Constraint-Based Synthetic Data Generation for LLM Mathematical Reasoning

Timofey Fedoseev · Dimitar I. Dimitrov · Timon Gehr · Martin Vechev

Keywords: [ counting problems ] [ mathematical problem solving ] [ Large language models ] [ Fine-tuning ] [ satisfiability modulo theories ]

[ Abstract ] [ Project Page ]

[ Poster] [ OpenReview]

Abstract:

Mathematical reasoning with large language models (LLMs) is an emerging research area. A recent breakthrough is the use of off-the-shelf tools LLMs are trained to utilize to offload complex tasks they cannot perform independently. Unfortunately, this approach is limited to popular tools, as many specialized tools lack the data to train these models on. Motivated by our observation that the current tools used with LLMs are insufficient for solving counting problems, in this work, we explore the problem of using Satisfiability Modulo Theories (SMT) solvers with LLMs. Namely, we leverage the SMT grammar to generate synthetic data consisting of problem statements and their solutions represented as Python code interacting with the Z3 API. Our experiments show that fine-tuning LLMs on this dataset substantially enhances their ability to generate accurate Z3 constraint encodings and improves their overall mathematical problem-solving capabilities.

Chat is not available.

Poster in Workshop: MATH-AI: The 4th Workshop on Mathematical Reasoning and AI

Constraint-Based Synthetic Data Generation for LLM Mathematical Reasoning

Timofey Fedoseev · Dimitar I. Dimitrov · Timon Gehr · Martin Vechev

Poster
in
Workshop: MATH-AI: The 4th Workshop on Mathematical Reasoning and AI