Workshop
MATH-AI: The 4th Workshop on Mathematical Reasoning and AI
Alex Gu · Gabriel Poesia · Cedegao (Ced) Zhang · Hattie Zhou · Pan Lu · Swaroop Mishra · Kai-Wei Chang · Armando Solar-Lezama
West Meeting Room 118-120
Sat 14 Dec, 8:25 a.m. PST
Mathematical reasoning is a fundamental aspect of human cognition that has been studied by scholars ranging from philosophers to cognitive scientists and neuroscientists. Mathematical reasoning involves analyzing complex information, identifying patterns and relationships, and drawing logical conclusions from evidence. It is central to many applications in science, engineering, finance, and everyday contexts. Recent advancements in large language models (LLMs) have unlocked new opportunities at the intersection of artificial intelligence and mathematical reasoning, ranging from new methods that solve complex problems or prove theorems, to new forms of human-machine collaboration in mathematics and beyond. Our proposed workshop is centered on the intersection of deep learning and mathematical reasoning, with an emphasis on, but not limited to, large language models. Our guiding theme is: ``To what extent can machine learning models comprehend mathematics, and what applications could arise from this capability?''
Schedule
Sat 8:25 a.m. - 8:30 a.m.
|
Opening Remarks
SlidesLive Video |
Alex Gu 🔗 |
Sat 8:30 a.m. - 9:00 a.m.
|
Invited Speaker: Dawn Song, UC Berkeley
SlidesLive Video |
Dawn Song 🔗 |
Sat 9:00 a.m. - 9:30 a.m.
|
Invited Speaker: Samy Bengio, Apple
SlidesLive Video |
Samy Bengio 🔗 |
Sat 9:30 a.m. - 10:00 a.m.
|
Invited Speaker: Noam Brown, OpenAI
SlidesLive Video |
Noam Brown 🔗 |
Sat 10:00 a.m. - 11:00 a.m.
|
Panel: Dawn Song, Jeremy Avigad, Noam Brown, Junehyuk Jung (moderator: Swaroop Mishra)
SlidesLive Video |
Dawn Song · Jeremy Avigad · Junehyuk Jung · Swaroop Mishra · Noam Brown 🔗 |
Sat 11:00 a.m. - 12:30 p.m.
|
Coffee Break and Poster Session 1
|
🔗 |
Sat 12:30 p.m. - 1:30 p.m.
|
Lunch Break
|
🔗 |
Sat 1:30 p.m. - 2:00 p.m.
|
Invited Speaker: Adam Wagner, Google Deepmind
SlidesLive Video |
Adam Zsolt Wagner 🔗 |
Sat 2:00 p.m. - 2:30 p.m.
|
Invited Speaker: Jeremy Avigad, CMU
SlidesLive Video |
Jeremy Avigad 🔗 |
Sat 2:30 p.m. - 3:00 p.m.
|
Coffee Break
|
🔗 |
Sat 3:00 p.m. - 3:30 p.m.
|
Invited Speaker: James Zou, Stanford
SlidesLive Video |
James Zou 🔗 |
Sat 3:30 p.m. - 4:00 p.m.
|
Contributed Talks
SlidesLive Video |
🔗 |
Sat 4:00 p.m. - 5:00 p.m.
|
Poster Session 2
|
🔗 |
Sat 5:00 p.m. - 5:05 p.m.
|
Closing Remarks
|
🔗 |
-
|
Math for AI: On the Generalization of Learning Mathematical Problem Solving ( Poster ) > link | Ruochen Zhou · Minrui Xu · Shiqi Chen · Junteng Liu · Yunqi Li · LIN Xinxin · Zhengyu Chen · Junxian He 🔗 |
-
|
CAFA: Coding as Auto-Formulation Can Boost Large Language Models in Solving Linear Programming Problem ( Poster ) > link | Haoxuan Deng · Bohao Zheng · YURI JIANG · Trung Tran 🔗 |
-
|
Probabilistic Proof State Compression: Optimizing LLM-Guided Formal Verification ( Poster ) > link | Noor Rahim · Ali Rahim 🔗 |
-
|
Intermediate Fine-Tuning Improves Mathematical Reasoning in Smaller Models ( Poster ) > link | Neeraj Gangwar · Suma Bhat · Nickvash Kani 🔗 |
-
|
LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery ( Poster ) > link | Pingchuan Ma · Tsun-Hsuan Johnson Wang · Minghao Guo · Zhiqing Sun · Josh Tenenbaum · Daniela Rus · Chuang Gan · Wojciech Matusik 🔗 |
-
|
Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning ( Poster ) > link | Yihe Deng · Paul Mineiro 🔗 |
-
|
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in LLMs — The Story Goes On ( Poster ) > link | Liang Zeng · Liangjun Zhong 🔗 |
-
|
Learning Mathematical Rules with Large Language Models ( Poster ) > link | Antoine Gorceix · Bastien Le Chenadec · Ahmad Rammal · Nelson Vadori · Manuela Veloso 🔗 |
-
|
Learning Elementary Cellular Automata with Transformers ( Poster ) > link | Mikhail Burtsev 🔗 |
-
|
Repeated examples help learn arithmetic ( Poster ) > link | Francois Charton · Julia Kempe 🔗 |
-
|
Structure Based Dataset on SAT Solving with Graph Neural Networks ( Poster ) > link | Yi Fu · Anthony Tompkins · Yang Song · Maurice Pagnucco 🔗 |
-
|
Proving Olympiad Algebraic Inequalities without Human Demonstrations ( Poster ) > link | Chenrui Wei · Mengzhou Sun · Wei Wang 🔗 |
-
|
MathDSL: A Domain-Specific Language for Concise Mathematical Solutions Via Program Synthesis ( Poster ) > link | Sagnik Anupam · Matthew Bowers · Omar Costilla Reyes · Armando Solar-Lezama 🔗 |
-
|
Decomposing Complex Visual Comprehension into Atomic Visual Skills for Vision Language Models ( Poster ) > link | Hyunsik Chae · Seungwoo Yoon · Chloe Yewon Chun · Gyehun Go · Yongin Cho · Gyeongmin Lee · Ernest Ryu 🔗 |
-
|
On Memorization of Large Language Models in Logical Reasoning ( Poster ) > link | Chulin Xie · Yangsibo Huang · Chiyuan Zhang · Da Yu · Xinyun Chen · Bill Yuchen Lin · Bo Li · Badih Ghazi · Ravi Kumar 🔗 |
-
|
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data ( Poster ) > link | Shubham Toshniwal · Wei Du · Ivan Moshkov · Branislav Kisacanin · Alexan Ayrapetyan · Igor Gitman 🔗 |
-
|
ABEL: Sample Efficient Online Reinforcement Learning for Neural Theorem Proving ( Poster ) > link | Fabian Gloeckle · Jannis Limperg · Gabriel Synnaeve · Amaury Hayat 🔗 |
-
|
How Transformers Reason: A Case Study on a Synthetic Propositional Logic Problem ( Poster ) > link | Guan Zhe Hong · Nishanth Dikkala · Enming Luo · Cyrus Rashtchian · Xin Wang · Rina Panigrahy 🔗 |
-
|
STEM-PoM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing ( Poster ) > link | Jiaru Zou · Qing Wang · Pratyush Thakur · Nickvash Kani 🔗 |
-
|
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving ( Poster ) > link | Yangzhen Wu · Zhiqing Sun · Shanda Li · Sean Welleck · Yiming Yang 🔗 |
-
|
Constraint-Based Synthetic Data Generation for LLM Mathematical Reasoning ( Poster ) > link | Timofey Fedoseev · Dimitar I. Dimitrov · Timon Gehr · Martin Vechev 🔗 |
-
|
HARDMATH: A Benchmark Dataset for Challenging Problems in Applied Mathematics ( Poster ) > link | Jingxuan Fan · Sarah Martinson · Erik Wang · Kaylie Hausknecht · Jonah Brenner · Danxian Liu · Nianli Peng · Corey Wang · Michael Brenner 🔗 |
-
|
Reasoning and Tools for Forecasting ( Poster ) > link | Elvis Hsieh · Preston Fu · Jonathan Chen 🔗 |
-
|
SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation. ( Poster ) > link | Prakhar Dixit · Tim Oates 🔗 |
-
|
miniCTX: Neural Theorem Proving with (Long-)Contexts ( Poster ) > link | Jiewen Hu · Thomas Zhu · Sean Welleck 🔗 |
-
|
Library Learning Doesn’t: The Curious Case of the Single-Use “Library” ( Poster ) > link | Ian Berlot-Attwell · Frank Rudzicz · Xujie Si 🔗 |
-
|
Synchronizing Verbal Responses and Board Writing for Multimodal Math Instruction with LLMs ( Poster ) > link | Yuan-Hao Jiang · Ruijia Li · Yuang Wei · Rui Jia · Xiaobao Shao · Hanglei Hu · Bo Jiang 🔗 |
-
|
Give me a hint: Can LLMs take a hint to solve math problems? ( Poster ) > link | Vansh Agrawal · Pratham Singla · Amitoj Miglani · Shivank Garg · Ayush Mangal 🔗 |
-
|
Math2Sym: A System for Solving Elementary Problems via Large Language Models and Symbolic Solvers ( Poster ) > link | Nguyen Phu · Phuong Pham · Man Ngo · Tuan Minh Kha 🔗 |
-
|
Transformers Can Do Arithmetic with the Right Embeddings ( Poster ) > link |
11 presentersSean McLeish · Arpit Bansal · Alex Stein · Neel Jain · John Kirchenbauer · Brian Bartoldson · Bhavya Kailkhura · Abhinav Bhatele · Jonas Geiping · Avi Schwarzschild · Tom Goldstein |
-
|
Transformers to Predict the Applicability of Symbolic Integration Routines ( Poster ) > link | Rashid Barket · Uzma Shafiq · Matthew England · Juergen Gerhard 🔗 |
-
|
Mining Math Conjectures from LLMs: A Pruning Approach ( Poster ) > link | Jake Chuharski · Elias Rojas Collins · Mark Meringolo 🔗 |
-
|
Wu’s Method Boosts Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry ( Poster ) > link | Shiven Sinha · Ameya Prabhu · Ponnurangam Kumaraguru · Siddharth Bhat · Matthias Bethge 🔗 |
-
|
Putnam-AXIOM: A Functional and Static Benchmark for Measuring Higher Level Mathematical Reasoning ( Poster ) > link | Aryan Gulati · Brando Miranda · Eric Chen · Emily Xia · Kai Fronsdal · Bruno de Moraes Dumont · Sanmi Koyejo 🔗 |
-
|
TurtleBench: A Visual Programming Benchmark in Turtle Geometry ( Poster ) > link | Sina Rismanchian · Yasaman Razeghi · Sameer Singh · Shayan Doroudi 🔗 |
-
|
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning ( Poster ) > link |
11 presentersXiaotian Han · Yiren Jian · Xuefeng Hu · Haogeng Liu · Yiqi Wang · Qihang Fan · Yuang Ai · Huaibo Huang · Ran He · Zhenheng Yang · Quanzeng You |
-
|
WILT: A Multi-turn, Memorization-Robust Inductive Logic Benchmark for LLMs ( Poster ) > link | Eryk Banatt · Jonathan Cheng · Tiffany Hwu 🔗 |
-
|
Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data ( Poster ) > link | Huajian Xin · Daya Guo · Zhihong Shao · Z.Z. Ren · Qihao Zhu · Bo Liu · Chong Ruan · Wenda Li · Xiaodan Liang 🔗 |
-
|
Machines and Mathematical Mutations: Using GNNs to Characterize Quiver Mutation Classes ( Poster ) > link | Jesse He · Helen Jenne · Herman Chau · Davis Brown · Mark Raugas · Sara Billey · Henry Kvinge 🔗 |
-
|
The Karp Dataset ( Poster ) > link | Mason DiCicco · Eamon Worden · Daniel Reichman · Neil Heffernan · Conner Olsen · Nikhil Gangaram 🔗 |
-
|
Not All LLM Reasoners Are Created Equal ( Poster ) > link | Arian Hosseini · Alessandro Sordoni · Daniel Toyama · Aaron Courville · Rishabh Agarwal 🔗 |
-
|
NLIR: Natural Language Intermediate Representation for Mechanized Theorem Proving ( Poster ) > link | Laetitia Teodorescu · Guillaume Baudart · Emilio Arias · marc lelarge 🔗 |
-
|
DafnyBench: A Benchmark for Formal Software Verification ( Poster ) > link | Chloe Loughridge · Qinyi Sun · Seth Ahrenbach · Federico Cassano · Chuyue (Livia) Sun · Ying Sheng · Anish Mudide · Md Rakib Hossain Misu · Nada Amin · Max Tegmark 🔗 |
-
|
Looped Transformers for Length Generalization ( Poster ) > link | Ying Fan · Yilun Du · Kannan Ramchandran · Kangwook Lee 🔗 |
-
|
Genetic Curriculum Learning for Distribution Generalization on the Travelling Salesman Problem ( Poster ) > link | Michael Li · Christopher Haberland · Natasha Jaques 🔗 |
-
|
Synthesizing Verified Mathematical Problems ( Poster ) > link | Xuefeng Li · Yanheng He · Pengfei Liu 🔗 |
-
|
VinePPO: Accurate Credit Assignment in RL for LLM Mathematical Reasoning ( Poster ) > link | Amirhossein Kazemnejad · Milad Aghajohari · Eva Portelance · Alessandro Sordoni · Siva Reddy · Aaron Courville · Nicolas Le Roux 🔗 |
-
|
Machine Learning meets Algebraic Combinatorics: A Suite of Datasets to Accelerate AI for Mathematics Research ( Poster ) > link | Herman Chau · Helen Jenne · Davis Brown · Jesse He · Mark Raugas · Sara Billey · Henry Kvinge 🔗 |
-
|
SBSC: Step-by-Step Coding for Improving Mathematical Olympiad Performance ( Poster ) > link | Kunal Singh · Ankan Biswas · Sayandeep Bhowmick · Pradeep Moturi 🔗 |
-
|
Models Can and Should Embrace the Communicative Nature of Human-Generated Math ( Poster ) > link | Sasha Boguraev · Ben Lipkin · Leonie Weissweiler · Kyle Mahowald 🔗 |
-
|
MathCAMPS: Fine-grained Synthesis of Mathematical Problems From Human Curricula ( Poster ) > link | Shubhra Mishra · Gabriel Poesia · Belinda Mo · Noah Goodman 🔗 |
-
|
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling ( Poster ) > link | Hritik Bansal · Arian Hosseini · Rishabh Agarwal · Vinh Tran · Mehran Kazemi 🔗 |
-
|
VerMCTS: Synthesizing Multi-Step Programs using a Verifier, a Large Language Model, and Tree Search ( Poster ) > link |
11 presentersDavid Brandfonbrener · Simon Henniger · Sibi Raja · Tarun Prasad · Chloe Loughridge · Federico Cassano · Sabrina Hu · Jianang Yang · William Byrd · Robert Zinkov · Nada Amin |
-
|
Reasoning in Reasoning: A Hierarchical Framework for Better and Faster Neural Theorem Proving ( Poster ) > link |
12 presentersZiyu Ye · Jiacheng Chen · Jonathan Li · Yifei Wang · Jiankai Sun · Mac Schwager · Philip Torr · Guohao Li · Yuxin Chen · Kaiyu Yang · Yisong Yue · Ziniu Hu |
-
|
Attention Bias as an Inductive Bias: How to Teach Transformers Simple Arithmetic ( Poster ) > link | Shaoxiong Duan · Yining Shi · Wei Xu 🔗 |
-
|
Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically ( Poster ) > link | Kefan Dong · Arvind Mahankali · Tengyu Ma 🔗 |
-
|
Generative Verifiers: Reward Modeling as Next-Token Prediction ( Poster ) > link | Lunjun Zhang · Arian Hosseini · Hritik Bansal · Mehran Kazemi · Aviral Kumar · Rishabh Agarwal 🔗 |
-
|
Regress, Don’t Guess – A Regression-like Loss on Number Tokens for Language Models ( Poster ) > link | Jonas Zausinger · Lars Pennig · Kacper Chlodny · Vincent Limbach · Anna Ketteler · Thorben Prein · Vishwa Mohan Singh · Michael Danziger · Jannis Born 🔗 |
-
|
A Hessian View of Grokking in Mathematical Reasoning ( Poster ) > link | Zhenshuo Zhang · Jerry Liu · Christopher Ré · Hongyang Zhang 🔗 |
-
|
Lean-STaR: Learning to Interleave Thinking and Proving ( Poster ) > link | Haohan Lin · Zhiqing Sun · Sean Welleck · Yiming Yang 🔗 |
-
|
Formal Representation and Solution of Plane Geometric Problems ( Poster ) > link | Xiaokai Zhang · Na Zhu · Cheng Qin · LI Yang · Zhenbing Zeng · Tuo Leng 🔗 |
-
|
Interleaving Text and Number Embeddings to Solve Mathemathics Problems ( Poster ) > link | Marvin Alberts · Gianmarco Gabrieli · Irina Morales 🔗 |
-
|
AI-Assisted Generation of Difficult Math Questions ( Poster ) > link |
11 presentersVedant Shah · Dingli Yu · Kaifeng Lyu · Simon Park · Jiatong Yu · Yinghui He · Nan Rosemary Ke · Michael Mozer · Yoshua Bengio · Sanjeev Arora · Anirudh Goyal |
-
|
The Art of Knowing When to Stop: Analysis of Optimal Stopping in People and Machines ( Poster ) > link | Fukun Zhang · Bonan Zhao 🔗 |
-
|
Towards Faster Quantum Circuit Simulation Using Graph Decompositions, GNNs and Reinforcement Learning ( Poster ) > link | Alexander Koziell-Pipe · Richie Yeung · Matthew Sutcliffe 🔗 |
-
|
CausalBench: A Comprehensive Benchmark for Evaluating Causal Reasoning Capabilities of Large Language Models ( Poster ) > link | ZEYU WANG 🔗 |
-
|
FEABench: Evaluating Language Models on Real World Physics Reasoning Ability ( Poster ) > link | Nayantara Mudur · Hao Cui · Subhashini Venugopalan · Paul Raccuglia · Michael Brenner · Peter Norgaard 🔗 |
-
|
Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting ( Poster ) > link | Tim Knappe · Ryan L Li · Ayush Chauhan · Kaylee Chhua · Kevin Zhu · Sean O'Brien 🔗 |
-
|
DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students’ Hand-Drawn Math Images ( Poster ) > link | Sami Baral · Li Lucy · Ryan Knight · Alice Ng · Luca Soldaini · Neil Heffernan · Kyle Lo 🔗 |