Skip to yearly menu bar Skip to main content


Poster
in
Workshop: MATH-AI: The 4th Workshop on Mathematical Reasoning and AI

Math for AI: On the Generalization of Learning Mathematical Problem Solving

Ruochen Zhou · Minrui Xu · Shiqi Chen · Junteng Liu · Yunqi Li · LIN Xinxin · Zhengyu Chen · Junxian He

Keywords: [ Reasoning Generalization ] [ Large language models ] [ mathematical reasoning ]


Abstract:

There has been a growing interest in enhancing the mathematical problem-solving (MPS) capabilities of LLMs. While some researchers focus on developing specialized math models to advance AI for math, others study mathematical reasoning with a ''math for AI'' perspective, positing that integrating mathematical reasoning data could enable LLMs to perform complex reasoning more broadly. This hypothesis draws from neuroscience studies which show that solving mathematical problems aids in the development of general reasoning skills in humans. The concept of ''math for AI'' has gained particular relevance as the research community increasingly focuses on complex reasoning -- Given the scarcity of complex and lengthy chain-of-thought data, MPS emerges as a prime candidate for collecting or synthesizing substantial volumes of intricate thought processes, thus serving as a potential key resource for enhancing general complex reasoning. However, it remains unclear whether skills acquired through learning MPS can extend to other reasoning tasks or merely improve MPS-specific benchmark scores. In this paper, we present a comprehensive empirical analysis to address this question.Specifically, we explore three prevalent methods for improving MPS: (1) continual pretraining on mathematical text; (2) instruction pretraining on large-scale QA pairs synthesized from raw text; and (3) instruction tuning on MPS datasets. Through controlled experiments and evaluations across seven distinct reasoning domains, we find that extensive continual pretraining on mathematical texts and instruction pretraining on diverse QA pairs can improve performance on more non-MPS reasoning tasks generally. However, instruction tuning on benchmark-oriented datasets to enhance MPS performance fails to yield significant gains in broad reasoning tasks. These findings indicate that most readily available data sources do not support the ''math for AI'' objective in enhancing non-MPS tasks. Identifying which data sources best contribute to the acquisition of complex reasoning skills remains a crucial question for future research.

Chat is not available.