Poster
in
Workshop: MATH-AI: Toward Human-Level Mathematical Reasoning
Estimating Numbers without Regression
Avijit Thawani · Jay Pujara · Ashwin Kalyan
Despite recent successes in language models, their ability to represent numbers is insufficient. Humans conceptualize numbers based on their magnitudes, effectively projecting them on a number line; whereas subword tokenization fails to explicitly capture magnitude by splitting numbers into arbitrary chunks. To alleviate this shortcoming, alternative approaches have been proposed that modify numbers at various stages of the language modeling pipeline. These methods change either the (1) notation in which numbers are written (eg scientific vs decimal), the (2) vocabulary used to represent numbers or the entire (3) architecture of the underlying language model, to directly regress to a desired number. In this work, we show that a potential trade-off to the more complex architectural changes is to simply change the model's vocabulary instead, \eg introduce a new token for numbers in range 10-100. In the context of masked number prediction, we find that a carefully designed tokenization scheme is both the simplest to implement and sufficient, i.e., with similar performance to the state-of-the-art approach that requires making significant architectural changes.Finally, we evaluate the various number representation schemes on the downstream task of numerical fact estimation (for Fermi Problems) in a zero-shot setting and find similar trends, i.e., changes at the tokenization level achieve near state-of-the-art results while requiring minimal resources compared to other number representation schemes.