Poster
in
Workshop: Foundation Models for Science: Progress, Opportunities, and Challenges
SysCaps: Language Interfaces for Simulation Surrogates of Complex Systems
Patrick Emami · Zhaonan Li · Saumya Sinha · Truc Nguyen
Keywords: [ surrogate models ] [ multimodal text and timeseries models ] [ language-interfaced regression ]
Data-driven simulation surrogates help computational scientists study complex systems. They can also help inform impactful policy decisions. We introduce a learning framework for surrogate modeling where language is used to interface with the underlying system being simulated. We call a language description of a system a ``system caption'', or SysCap. To address the lack of datasets of paired natural language SysCaps and simulation runs, we use large language models (LLMs) to synthesize high-quality captions. Using our framework, we train multimodal text and timeseries regression models for two real-world simulators of complex energy systems. Our experiments demonstrate the feasibility of designing language interfaces for real-world surrogate models at comparable accuracy to standard baselines. We qualitatively and quantitatively show that SysCaps unlock text-prompt-style surrogate modeling and new generalization abilities beyond what was previously possible. We will release the generated SysCaps datasets and our code to support follow-on studies.