Skip to yearly menu bar Skip to main content


Poster
in
Workshop: AI for Accelerated Materials Design (AI4Mat-2023)

Data Efficient Training for Materials Property Prediction Using Active Learning Querying

Carmelo Gonzales · Kin Long Kelvin Lee · Bin Mu · Michael Galkin · Santiago Miret

Keywords: [ data efficient ] [ machine learning ] [ Active Learning ] [ active learning ]


Abstract:

The field of machine learning for materials property prediction and characterization is seeing rapid developments in models, datasets, and frameworks. While datasets and models grow in size, frameworks must mature concurrently to match the data requirements and quick development cycles required to support these growing workloads. The efficient training of models is one area where machine learning frameworks may be improved. Utilizing active learning querying strategies to train models from scratch using fewer data can lead to faster development cycles, model evaluations, and reduced costs of training. Well-studied active learning querying strategies from computer vision and natural language processing are directly applied to train an E(n)-GNN model from scratch using a subset of the Materials Project Database and Novel Materials Discovery (NOMAD) Database, with the results compared to data subset selection techniques and the standard training pipeline. In general, the models trained with active learning querying strategies meet or exceed the performance standard trained models while using significantly less training data.

Chat is not available.