Poster
in
Workshop: AI4Mat-2024: NeurIPS 2024 Workshop on AI for Accelerated Materials Design
Automated, LLM enabled extraction of synthesis details for reticular materials from scientific literature
Viviane Torres da Silva · Alexandre Rademaker · Krystelle Lionti · Ronaldo Giro · Geisa Lima · Sandro Fiorini · Marcelo Archanjo · Breno Carvalho · Rodrigo Neumann Barros Ferreira · Anaximandro Souza · João de Souza · Gabriela de Valnisio Ferreira de Andrade · Carmen Paz · Renato Cerqueira · Mathias Steiner
Keywords: [ scientific literature ] [ reticular material ] [ synthesis details ] [ LLM ] [ knowledge extraction ]
Automated knowledge extraction from scientific literature can potentially acceleratematerials discovery. We have investigated an approach for extracting synthesisprotocols for reticular materials from scientific literature using large languagemodels (LLMs). To that end, we introduce a Knowledge Extraction Pipeline (KEP)that automatizes LLM-assisted paragraph classification and information extraction.By applying prompt engineering with in-context learning (ICL) to a set of open-source LLMs, we demonstrate that LLMs can retrieve chemical information fromPDF documents, without the need for fine-tuning or training and at a reduced riskof hallucination. By comparing the performance of five open-source families ofLLMs in both paragraph classification and information extraction tasks, we observeexcellent model performance even if only few example paragraphs are included inthe ICL prompts. The results show the potential of the KEP approach for reducinghuman annotations and data curation efforts in automated scientific knowledgeextraction.