Skip to yearly menu bar Skip to main content


Oral
in
Workshop: Evaluating Evaluations: Examining Best Practices for Measuring Broader Impacts of Generative AI

JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark

Shota Onohara · Atsuyuki Miyai · Yuki Imajuku · Kazuki Egashira · Jeonghun Baek · Xiang Yue · Graham Neubig · Kiyoharu Aizawa

Keywords: [ Large Multimodal Models ] [ Japanese Benchmark ] [ Culture-aware Benchmark ]


Abstract:

We introduce JMMMU (Japanese MMMU), an expert-level benchmark that can truly evaluate the performance of large multimodal models (LMMs) in Japanese. Compared to other existing Japanese multimodal benchmarks, JMMMU requires a deep understanding of Japanese culture and advanced reasoning skills, and it includes more than ten times the number of questions found in similar benchmarks, enabling more reliable quantitative evaluations. We believe our findings inspire the development of high-standard benchmarks in more languages, and pave the way for LMM developments that are more inclusive of non-English languages.

Chat is not available.