Workshop
Multimodal Algorithmic Reasoning Workshop
Anoop Cherian · Kuan-Chuan Peng · Suhas Lohit · Honglu Zhou · Kevin Smith · Tim Marks · Juan Carlos Niebles · Petar Veličković
West Exhibition Hall A
Sun 15 Dec, 8:25 a.m. PST
In this workshop, we plan to gather researchers working in neural algorithmic learning, multimodal reasoning, and cognitive models of intelligence to showcase their cutting-edge research, discuss the latest challenges, as well as bring to the forefront problems in perception and language modeling that are often overlooked but are pivotal in achieving true artificial general intelligence. An emphasis of this workshop is on the emerging topic of multimodal algorithmic reasoning, where a reasoning agent is required to automatically deduce new algorithms/procedures for solving real-world tasks, e.g., algorithms that use multimodal foundational models for analysis, synthesis, and planning, new approaches towards solving challenging vision-and-language mathematical (Olympiad type) reasoning problems, deriving winning strategies in multimodal games, procedures for using tools in robotic manipulation, etc. We hope to deep dive into this exciting topic at the intersection of multimodal learning and cognitive science to understand what we have achieved thus far in machine intelligence and what we are lacking in relation to the human way of thinking -- through talks from outstanding researchers and faculty that could inspire the audience to search for the missing rungs on the ladder to true intelligence.
Schedule
Sun 8:25 a.m. - 8:30 a.m.
|
Welcome
(
Introduction
)
>
SlidesLive Video |
Anoop Cherian 🔗 |
Sun 8:30 a.m. - 9:15 a.m.
|
Keynote: Prof. Joshua B. Tenenbaum
(
Invited Talk
)
>
SlidesLive Video |
Josh Tenenbaum 🔗 |
Sun 9:15 a.m. - 9:30 a.m.
|
Coffee Break
|
🔗 |
Sun 9:30 a.m. - 10:15 a.m.
|
Keynote: Learning Algorithms with GNNs and Transformers
(
Invited Talk
)
>
SlidesLive Video |
Stefanie Jegelka 🔗 |
Sun 10:15 a.m. - 10:25 a.m.
|
KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models
(
Oral
)
>
SlidesLive Video |
Eunice Yiu · Maan Qraitem · Charlie CJ Wong · Anisa N Majhi · Yutong Bai · Shiry Ginosar · Alison Gopnik · Kate Saenko 🔗 |
Sun 10:25 a.m. - 10:35 a.m.
|
AVUA: Adaptive Video Understanding Agent Enhancing efficiency with dynamic frame sampling and feedback-driven reasoning
(
Oral
)
>
SlidesLive Video |
Sullam Jeoung · Goeric Huybrechts · Bhavana Ganesh · Aram Galstyan · Sravan Babu Bodapati 🔗 |
Sun 10:35 a.m. - 10:45 a.m.
|
Neural Networks for Abstraction & Reasoning
(
Oral
)
>
SlidesLive Video |
Mikel Bober-Irizar · Soumya Banerjee 🔗 |
Sun 11:00 a.m. - 11:45 a.m.
|
Keynote: Prioritizing Perception in Multimodal Language Models
(
Invited Talk
)
>
SlidesLive Video |
Ranjay Krishna 🔗 |
Sun 11:45 a.m. - 11:50 a.m.
|
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
(
Spotlight
)
>
SlidesLive Video |
13 presentersZirui Wang · Mengzhou Xia · Luxi He · Howard Chen · Yitao Liu · Richard Zhu · Kaiqu Liang · Xindi Wu · Haotian Liu · Sadhika Malladi · Alexis Chevalier · Sanjeev Arora · Danqi Chen |
Sun 11:50 a.m. - 11:55 a.m.
|
Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual Illusions
(
Spotlight
)
>
link
SlidesLive Video |
Mohammadmostafa Rostamkhani · Baktash Ansariogholbake · Hoorieh Sabzevari · Farzan Rahmani · Sauleh Eetemadi 🔗 |
Sun 11:55 a.m. - 12:00 p.m.
|
HAMMR : HierArchical MultiModal React agents for generic VQA
(
Spotlight
)
>
SlidesLive Video |
Lluis Castrejon · Thomas Mensink · Howard Zhou · Vittorio Ferrari · Andre Araujo · Jasper Uijlings 🔗 |
Sun 12:00 p.m. - 12:05 p.m.
|
Are Large-Language Models Graph Algorithmic Reasoners?
(
Spotlight
)
>
SlidesLive Video |
Alexander Taylor · Anthony Cuturrufo · Vishal Yathish · Mingyu Derek Ma · Wei Wang 🔗 |
Sun 12:05 p.m. - 12:10 p.m.
|
TurtleBench: A Visual Programming Benchmark in Turtle Geometry
(
Spotlight
)
>
SlidesLive Video |
Sina Rismanchian · Yasaman Razeghi · Sameer Singh · Shayan Doroudi 🔗 |
Sun 12:10 p.m. - 12:15 p.m.
|
ENTER: Event Based Interpretable Reasoning for VideoQA
(
Spotlight
)
>
SlidesLive Video |
11 presentersHammad Ayyubi · Junzhang Liu · Zhecan Wang · Hani Alomari · Chia-Wei Tang · Ali Asgarov · Md. Atabuzzaman · Najibul Haque Sarker · Zaber Hakim · Shih-Fu Chang · Chris Thomas |
Sun 12:15 p.m. - 1:30 p.m.
|
Lunch Break
|
🔗 |
Sun 1:30 p.m. - 2:15 p.m.
|
Keynote: Training Robots to Think Harder
(
Invited Talk
)
>
SlidesLive Video |
Sergey Levine 🔗 |
Sun 2:15 p.m. - 4:15 p.m.
|
Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited Modalities
(
Poster
)
>
|
Adriel Saporta · Aahlad Manas Puli · Mark Goldstein · Rajesh Ranganath 🔗 |
Sun 2:15 p.m. - 4:15 p.m.
|
Smart Vision-Language Reasoners
(
Poster
)
>
|
Denisa Olteanu Roberts · Lucas R Roberts 🔗 |
Sun 2:15 p.m. - 4:15 p.m.
|
Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding
(
Poster
)
>
|
Shenghuan Sun · Alexander Schubert · Greg Goldgof · Zhiqing Sun · Tom Hartvigsen · Atul Butte · Ahmed Alaa 🔗 |
Sun 2:15 p.m. - 4:15 p.m.
|
Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering
(
Poster
)
>
|
Rabiul Awal · LE ZHANG · Aishwarya Agrawal 🔗 |
Sun 2:15 p.m. - 4:15 p.m.
|
ViLAaD: Enhancing ``Attracting and Dispersing'' Source-Free Domain Adaptation with Vision and Language Model
(
Poster
)
>
|
Shuhei Tarashima · XINQI SHU · Norio Tagawa 🔗 |
Sun 2:15 p.m. - 4:15 p.m.
|
Chitrarth: Bridging Vision and Language for a Billion People
(
Poster
)
>
|
Shaharukh Khan · Ayush Tarun · Abhinav Ravi · Ali Faraz · Praveen Kumar Pokala · Anagha Bhangare · Raja Kolla · Chandra Khatri · Shubham Agarwal 🔗 |
Sun 2:15 p.m. - 4:15 p.m.
|
LVM-Net: Efficient Long-Form Video Reasoning
(
Poster
)
>
|
Saket Gurukar · Asim Kadav 🔗 |
Sun 2:15 p.m. - 4:15 p.m.
|
Vision-LLMs Can Fool Themselves with Self-Generated Text
(
Poster
)
>
|
Maan Qraitem · Nazia Tasnim · Piotr Teterwak · Kate Saenko · Bryan Plummer 🔗 |
Sun 2:15 p.m. - 4:15 p.m.
|
LLAVIDAL: Benchmarking Large LAnguage VIsion Models for Daily Activities of Living
(
Poster
)
>
|
Rajatsubhra Chakraborty · Arkaprava Sinha · Dominick Reilly · Manish Kumar Govind · Pu Wang · francois bremond · Srijan Das 🔗 |
Sun 2:15 p.m. - 4:15 p.m.
|
AVUA: Adaptive Video Understanding Agent Enhancing efficiency with dynamic frame sampling and feedback-driven reasoning
(
Poster
)
>
|
Sullam Jeoung · Goeric Huybrechts · Bhavana Ganesh · Aram Galstyan · Sravan Babu Bodapati 🔗 |
Sun 2:15 p.m. - 4:15 p.m.
|
KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models
(
Poster
)
>
|
Eunice Yiu · Maan Qraitem · Charlie CJ Wong · Anisa N Majhi · Yutong Bai · Shiry Ginosar · Alison Gopnik · Kate Saenko 🔗 |
Sun 2:15 p.m. - 4:15 p.m.
|
Neural Networks for Abstraction & Reasoning
(
Poster
)
>
|
Mikel Bober-Irizar · Soumya Banerjee 🔗 |
Sun 2:15 p.m. - 4:15 p.m.
|
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
(
Poster
)
>
|
13 presentersZirui Wang · Mengzhou Xia · Luxi He · Howard Chen · Yitao Liu · Richard Zhu · Kaiqu Liang · Xindi Wu · Haotian Liu · Sadhika Malladi · Alexis Chevalier · Sanjeev Arora · Danqi Chen |
Sun 2:15 p.m. - 4:15 p.m.
|
Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual Illusions
(
Poster
)
>
|
Mohammadmostafa Rostamkhani · Baktash Ansariogholbake · Hoorieh Sabzevari · Farzan Rahmani · Sauleh Eetemadi 🔗 |
Sun 2:15 p.m. - 4:15 p.m.
|
HAMMR : HierArchical MultiModal React agents for generic VQA
(
Poster
)
>
|
Lluis Castrejon · Thomas Mensink · Howard Zhou · Vittorio Ferrari · Andre Araujo · Jasper Uijlings 🔗 |
Sun 2:15 p.m. - 4:15 p.m.
|
Are Large-Language Models Graph Algorithmic Reasoners?
(
Poster
)
>
|
Alexander Taylor · Anthony Cuturrufo · Vishal Yathish · Mingyu Derek Ma · Wei Wang 🔗 |
Sun 2:15 p.m. - 4:15 p.m.
|
TurtleBench: A Visual Programming Benchmark in Turtle Geometry
(
Poster
)
>
|
Sina Rismanchian · Yasaman Razeghi · Sameer Singh · Shayan Doroudi 🔗 |
Sun 2:15 p.m. - 4:15 p.m.
|
ENTER: Event Based Interpretable Reasoning for VideoQA
(
Poster
)
>
|
11 presentersHammad Ayyubi · Junzhang Liu · Zhecan Wang · Hani Alomari · Chia-Wei Tang · Ali Asgarov · Md. Atabuzzaman · Najibul Haque Sarker · Zaber Hakim · Shih-Fu Chang · Chris Thomas |
Sun 4:15 p.m. - 5:00 p.m.
|
Keynote: LLM Posteriors over Functions as a New Output Modality
(
Invited Talk
)
>
SlidesLive Video |
David Duvenaud 🔗 |
Sun 5:00 p.m. - 5:05 p.m.
|
Closing Remarks
(
Closing Remarks
)
>
SlidesLive Video |
Anoop Cherian 🔗 |