Skip to yearly menu bar Skip to main content


Poster
in
Workshop: AI4Mat-2024: NeurIPS 2024 Workshop on AI for Accelerated Materials Design

Advancing the ColabFit Exchange towards a Web-scale Data Source for Machine Learning Interatomic Potentials

Eric Fuemmeler · Gregory Wolfe · Amit Gupta · Joshua Vita · Ellad Tadmor · Stefano Martiniani

Keywords: [ data ] [ database ] [ interatomic potential ] [ MLIP ]

[ ] [ Project Page ]
 
presentation: AI4Mat-2024: NeurIPS 2024 Workshop on AI for Accelerated Materials Design
Sat 14 Dec 8:15 a.m. PST — 5:20 p.m. PST

Abstract:

Data-driven (DD) interatomic potentials (IPs) trained on large collections of first principles calculations are rapidly becoming essential tools in the fields of computational materials science and chemistry for discovery pipelines and performing atomic-scale simulations. Despite this, apart from a few notable exceptions, there is a distinct lack of well-organized, public datasets in common formats available for use with IP development. This deficiency precludes the research community from implementing widespread benchmarking, which is essential for gaining insight into model performance and transferability, and also limits the development of more general universal (perhaps even multi-source) IPs. To address this issue, last year we introduced the ColabFit Exchange, the first database providing open access to a large collection of systematically organized datasets from multiple domains that is especially designed for IP development. It has now grown to contain 369 datasets spanning nearly 400,000 unique chemistries. Here we discuss recent updates to the ColabFit Exchange, including data statistics for the ever-growing database, modifications to the data standard and database backend, and new tools to utilize the data for machine learning (ML) applications.

Chat is not available.