Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Foundation Models for Science: Progress, Opportunities, and Challenges

CLOUD: A Scalable Scientific Foundation Model for Crystal Representation Learning

Changwen Xu · Zhu · Venkatasubramanian Viswanathan

Keywords: [ crystal property prediction ] [ symmetry-aware string representation ] [ Scientific foundation model ]


Abstract:

Developing machine learning models for crystal property predictions has been hampered by the need for labeled data from costly experiments or Density Functional Theory (DFT), resulting in limited data size and poor generalization to new crystals. Foundation models (FMs) present a potential solution with their self-supervised pre-training on unlabeled datasets and scalable model performance. Yet, applying FMs to crystals is challenging due to the inadequacy of existing string representations to capture critical structural information and the absence of scaling analysis for FMs specialized in materials science. Herein, We propose CrystaL fOUnDation model (CLOUD), a Transformer-based foundation model for crystal representation learning and property prediction. CLOUD utilizes a novel symmetry-aware string representation, eliminating the need for atomic coordinates or equivariant models. Pre-trained on million-scale crystal data, CLOUD is then fine-tuned and assessed on various downstream tasks, significantly outperforming other coordinate-free models on MatBench and MatBench Discovery. In addition, CLOUD achieves state-of-the-art (SOTA) or near-SOTA performance on UnconvBench for unconventional crystal property predictions. Furthermore, the pre-trained CLOUD demonstrates robust scaling with data and model size, which suggests CLOUD's potential as a scalable solution for crystal foundation models.

Chat is not available.