Skip to yearly menu bar Skip to main content


Poster
in
Workshop: MATH-AI: The 4th Workshop on Mathematical Reasoning and AI

miniCTX: Neural Theorem Proving with (Long-)Contexts

Jiewen Hu · Thomas Zhu · Sean Welleck

Keywords: [ Benchmark dataset ] [ neural theorem proving ] [ Formal mathematics ]


Abstract: Real-world formal theorem proving often depends on rich prior contexts, including definitions, lemmas, comments, file structure, and other information. We introduce $\texttt{miniCTX}$, which tests a model's ability to prove formal mathematical theorems that depend on new context not encountered in training. $\texttt{miniCTX}$ contains theorems sourced from real Lean projects and textbooks, each associated with a context that can span tens of thousands of tokens. Models are tasked with proving a theorem given access to code from the theorem's repository, which contains context that is needed for the proof. As a baseline for $\texttt{miniCTX}$, we tested fine-tuning and prompting methods that condition theorem proving on preceding context. Both approaches substantially outperform traditional methods that rely solely on state information. We found that this ability to utilize context is not captured by previous benchmarks such as $\texttt{miniF2F}$. Alongside $\texttt{miniCTX}$, we offer $\texttt{ntp-toolkit}$ for automatically extracting and annotating theorem proving data, making it easy to add new projects into $\texttt{miniCTX}$ to ensure that contexts are not seen during training. $\texttt{miniCTX}$ offers a challenging and realistic evaluation of neural theorem provers.

Chat is not available.