Poster
in
Workshop: AI4Mat-2024: NeurIPS 2024 Workshop on AI for Accelerated Materials Design
Evaluating Chemistry Prompts for Large-Language Model Fine-Tuning
Carmelo Gonzales · Michael Pieler · Kevin Maik Jablonka · Santiago Miret
Keywords: [ Benchmarking ] [ Large Language Models ] [ Templating ] [ Prompting ] [ Fine-Tuning ]
We perform a study of large language model (LLM) templating and data presentation in the field of chemistry and materials science by analyzing memorization and generalization performance of a LlaMa model fine-tuned on 34 unique datasets. As application domains for LLMs become more specialized, it becomes more and more important to understand the impacts of training data, templates, and evaluations. While many pretrained LLMs have observed enormous corpora of text data, they are not guaranteed to be useful in domain specific tasks which may involve specialized data and prompts, such as chemistry and materials science. To further understand the capabilities of LLMs, we study the performance of various fine-tuned base models and show how differences in template styles with varying molecular string representations affect model performance. We hope that these insights may serve as a helpful path towards future larger scale training for chemistry and materials science specific LLMs.