Skip to yearly menu bar Skip to main content


Poster

ReMI: A Dataset for Reasoning with Multiple Images

Mehran Kazemi · Nishanth Dikkala · Ankit Anand · Petar Devic · Ishita Dasgupta · Fangyu Liu · Bahare Fatemi · Pranjal Awasthi · Sreenivas Gollapudi · Dee Guo · Ahmed Qureshi

West Ballroom A-D #5104
[ ] [ Project Page ]
[ Slides [ Poster
Fri 13 Dec 11 a.m. PST — 2 p.m. PST

Abstract:

With the continuous advancement of large language models (LLMs), it is essential to create new benchmarks to evaluate their expanding capabilities and identify areas for improvement. This work focuses on multi-image reasoning, an emerging capability in state-of-the-art LLMs. We introduce ReMI, a dataset designed to assess LLMs' ability to reason with multiple images. This dataset encompasses a diverse range of tasks, spanning various reasoning domains such as math, physics, logic, code, table/chart understanding, and spatial and temporal reasoning. It also covers a broad spectrum of characteristics found in multi-image reasoning scenarios. We have benchmarked several cutting-edge LLMs using ReMI and found a substantial gap between their performance and human-level proficiency. This highlights the challenges in multi-image reasoning and the need for further research. Our analysis also reveals the strengths and weaknesses of different models, shedding light on the types of reasoning that are currently attainable and areas where future models require improvement. We anticipate that ReMI will be a valuable resource for developing and evaluating more sophisticated LLMs capable of handling real-world multi-image understanding tasks.

Chat is not available.