NeurIPS KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models

Oral
in
Workshop: Multimodal Algorithmic Reasoning Workshop

KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models

Eunice Yiu · Maan Qraitem · Charlie CJ Wong · Anisa N Majhi · Yutong Bai · Shiry Ginosar · Alison Gopnik · Kate Saenko

[ Abstract ]

Sun 15 Dec 10:15 a.m. PST — 10:25 a.m. PST

presentation: Multimodal Algorithmic Reasoning Workshop
Sun 15 Dec 8:25 a.m. PST — 5:05 p.m. PST

Abstract:

This paper investigates visual analogical reasoning in large multimodal models (LMMs) compared to human adults and children. A “visual analogy” is an abstract rule inferred from one image and applied to another. While benchmarks exist for testing visual reasoning in LMMs, they require advanced skills and omit basic visual analogies that young children make. Inspired by developmental psychology, we propose a new benchmark of 1,400 visual transformations of everyday objects to test LMMs on visual analogical reasoning and compare them to children and adults. We structure the evaluation into three stages: identifying what changed (e.g., color, number, etc.), how it changed (e.g., added one object), and applying the rule to new scenarios. Our findings show that while GPT-4V, LLaVA-1.5, and MANTIS identify the “what” effectively, they struggle with quantifying the “how” and extrapolating this rule to new objects. In contrast, children and adults exhibit much stronger analogical reasoning at all three stages. Conversely, more complex tasks such as number, rotation, and reflection, which necessitate extensive cognitive processing and understanding of extrinsic spatial properties in the physical world, present more significant challenges. Altogether, these findings highlight the limitations of training models on data that primarily consists of 2D images and text.

Chat is not available.

Oral in Workshop: Multimodal Algorithmic Reasoning Workshop

KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models

Eunice Yiu · Maan Qraitem · Charlie CJ Wong · Anisa N Majhi · Yutong Bai · Shiry Ginosar · Alison Gopnik · Kate Saenko

Oral
in
Workshop: Multimodal Algorithmic Reasoning Workshop