Poster
in
Workshop: Foundation Models for Science: Progress, Opportunities, and Challenges
Provable in-context learning of linear systems and linear elliptic PDEs with transformers
Frank Cole · Yulong Lu · Tianhao Zhang · Riley O'Neill
Keywords: [ transformer ] [ In-context learning ] [ elliptic PDE ] [ scientific foundation model ]
Foundation models for natural language processing, empowered by the transformer architecture, exhibit remarkable {\em in-context learning} (ICL) capabilities: pre-trained models can adapt to a downstream task by only conditioning on few-shot prompts without updating the weights of the models. Recently, transformer-based foundation models also emerged as universal tools for solving scientific problems, including especially partial differential equations (PDEs). However, the theoretical underpinnings of ICL-capabilities of these models still remain elusive. This work develops rigorous error analysis for transformer-based ICL of the solution operators associated to a family of linear elliptic PDEs. Specifically, we show that a linear transformer defined by a linear self-attention layer can provably learn in-context to invert linear systems arsing from the spatial discretization of the PDEs. We derive theoretical scaling laws for the proposed linear transformers in terms of the size of the spatial discretization, the number of training tasks, the lengths of prompts used during training and inference, under both the in-domain generalization setting and various settings of distribution shifts. Empirically, we validate the ICL-capabilities of transformers through extensive numerical experiments.