Poster
in
Workshop: AI4Mat-2024: NeurIPS 2024 Workshop on AI for Accelerated Materials Design
MaCBench: A multimodal chemistry and materials science benchmark
Nawaf Alampara · Indrajeet Mandal · Pranav Khetarpal · Hargun Grover · Mara Schilling-Wilhelmi · N M Anoop Krishnan · Kevin Maik Jablonka
Keywords: [ LLM ] [ Benchmark ] [ Multimodal ] [ MLM ]
We present MaCBench, a multimodal benchmark for evaluating AI models in chemistry and materials science tasks. This benchmark addresses the lack of comprehensive, domain-specific evaluation tools for multimodal AI in scientific contexts. MaCBench encompasses tasks across three key areas: fundamental scientific understanding, data extraction from visual information, and practical laboratory knowledge, totaling 628 questions. It includes diverse visual inputs such as laboratory images, band structures, crystal structures, and atomic force microscopy images paired with multiple-choice questions. We evaluate state-of-the-art multimodal AI models (GPT4-o, Claude-3.5-Sonnet, Gemini-1.5-Pro) on MaCBench, revealing significant performance variations across tasks and skills. While models excel at basic pattern recognition and information retrieval, they struggle with complex reasoning and applying scientific principles to novel situations. Notably, we observe a disconnect between object recognition and contextual understanding in laboratory safety scenarios. MaCBench provides crucial insights into the capabilities and limitations of multimodal AI in chemistry and materials science, serving as a valuable tool for guiding the development of more capable AI systems for scientific research.