Workshop
Foundation Models for Decision Making
Mengjiao (Sherry) Yang · Yilun Du · Jack Parker-Holder · Siddharth Karamcheti · Igor Mordatch · Shixiang (Shane) Gu · Ofir Nachum
Room 291 - 292
Sat 3 Dec, 6:50 a.m. PST
Humans acquire vision, language, and decision making abilities through years of experience, arguably corresponding to millions of video frames, audio clips, and interactions with the world. Following this data-driven approach, recent foundation models trained on large and diverse datasets have demonstrated emergent capabilities and fast adaptation to a wide range of downstream vision and language tasks (e.g., BERT, DALL-E, GPT-3, CLIP). Meanwhile in the decision making and reinforcement learning (RL) literature, foundation models have yet to fundamentally shift the traditional paradigm in which an agent learns from its own or others’ collected experience, typically on a single-task and with limited prior knowledge. Nevertheless, there has been a growing body of foundation-model-inspired research in decision making that often involves collecting large amounts of interactive data for self-supervised learning at scale. For instance, foundation models such as BERT and GPT-3 have been applied to modeling trajectory sequences of agent experience, and ever-larger datasets have been curated for learning multimodel, multitask, and generalist agents. These works demonstrate the potential benefits of foundation models on a broad set of decision making applications such as autonomous driving, healthcare systems, robotics, goal-oriented dialogue, robotics, and recommendation systems.
Despite early signs of success, foundation models for decision making remain largely underexplored, underutilized, and lacking solid empirical and theoretical grounding. The challenges faced by existing research are as follows:
1. Many traditional decision making benchmarks are (near-)Markovian (i.e., historyless), and this brings the value of sequence modeling into question. The true power of foundation models may require more complex tasks.
2. Decision making tasks are composed of multi-modal data. At minimum, the states (observations), actions, and rewards of a task are each of different types. Moreover, across different tasks, states and actions can be highly distinct (image vs. text observations, discrete vs. continuous actions).
3. Unlike vision and language, decision making agents can further interact with the environment to collect additional experience in conjunction with learning on existing data. How such an interactive component should be integrated with foundation models is not clear.
4. There already exhibits a large gap between theory and practice in decision making. Hastily applying large models to decision making might create an even greater gap.
Goal of the workshop: The goal of this workshop is to bring together the decision making community and the foundation models community in vision and language to confront the challenges in decision making at scale. The workshop will span high-level discussions on how foundation models can help decision making (if at all) and low-level algorithmic differences of decision, vision, and language which might lead to both opportunities or challenges for applying foundation models to decision making. More specific topics will include but are not limited to:
1. Common or distinct properties of vision, language, and decision making tasks that reassure or challenge the value of foundation models in decision making.
2. Introduction or proposals for new benchmarks to facilitate better research for foundation models for decision making.
3. How decision making can benefit from techniques already popular for foundation models, such as autoregressive sequence models, diffusion models, contrastive pretraining, masked autoencoders, prompting, etc.
4. Lessons learned from developing engineering frameworks, datasets and benchmarks, and evaluation protocols for foundation models in vision and language, and how can the decision making community benefit from these lessons.
5. How foundation models relate to the theoretical foundations of sequential decision making.
Schedule
Sat 6:50 a.m. - 7:00 a.m.
|
Ofir Nachum: Opening Remarks
(
In-Person Introduction
)
>
SlidesLive Video |
🔗 |
Sat 7:00 a.m. - 7:15 a.m.
|
Is Conditional Generative Modeling all you need for Decision-Making?
(
Oral Presentation
)
>
SlidesLive Video |
🔗 |
Sat 7:15 a.m. - 7:30 a.m.
|
Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training
(
Oral Presentation
)
>
link
SlidesLive Video |
🔗 |
Sat 7:30 a.m. - 7:45 a.m.
|
VIMA: General Robot Manipulation with Multimodal Prompts
(
Oral Presentation
)
>
link
SlidesLive Video |
🔗 |
Sat 7:45 a.m. - 8:00 a.m.
|
Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action
(
Oral Presentation
)
>
link
SlidesLive Video |
🔗 |
Sat 8:00 a.m. - 8:30 a.m.
|
Gabriel Barth-Maron: Gato: A Generalist Agent
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 8:30 a.m. - 9:00 a.m.
|
Jim Fan: Open-Ended Embodied Agents with Internet-Scale Knowledge
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 9:00 a.m. - 9:30 a.m.
|
Leslie P. Kaelbling: What does an intelligent robot need to know?
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 9:30 a.m. - 10:00 a.m.
|
Dorsa Sadigh: Learning and Leveraging Foundation Models in Robotics
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 11:00 a.m. - 11:15 a.m.
|
REACT: Synergizing Reasoning and Acting in Language Models
(
Oral Presentation
)
>
link
SlidesLive Video |
🔗 |
Sat 11:15 a.m. - 11:30 a.m.
|
Generative Pretraining for Black-Box Optimization
(
Oral Presentation
)
>
link
SlidesLive Video |
🔗 |
Sat 11:30 a.m. - 11:45 a.m.
|
In-context Reinforcement Learning with Algorithm Distillation
(
Oral Presentation
)
>
link
SlidesLive Video |
🔗 |
Sat 11:45 a.m. - 12:00 p.m.
|
Large Language Models Are Human-Level Prompt Engineers
(
Oral Presentation
)
>
link
SlidesLive Video |
🔗 |
Sat 12:00 p.m. - 12:30 p.m.
|
Thomas Wolf: Unlocking Foundation Models for Embodied Learning – What Tools Will We Need?
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 12:30 p.m. - 1:00 p.m.
|
Machel Reid: On using pre-trained language models for reinforcement learning
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 1:00 p.m. - 1:30 p.m.
|
Deepak Pathak: Invited Talk
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 1:30 p.m. - 2:00 p.m.
|
Dale Schuurmans: Large Foundation Models and Reinforcement Learning
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 2:00 p.m. - 2:30 p.m.
|
Panel Discussion
(
Panel Discussion
)
>
SlidesLive Video |
🔗 |
-
|
Revealing the Bias in Large Language Models via Reward Structured Questions ( Poster ) > link | Ezgi Korkmaz 🔗 |
-
|
Intelligent Variable Selection for Branch \& Bound Methods ( Poster ) > link | Priya Shanmugasundaram · Saurabh Jha · Sailendu Patra 🔗 |
-
|
Skill Decision Transformer ( Poster ) > link | Shyam Sudhakaran · Sebastian Risi 🔗 |
-
|
PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pretraining
(
Poster
)
>
link
SlidesLive Video |
Rogerio Bonatti · Sai Vemprala · shuang ma · Felipe Vieira Frujeri · Shuhang Chen · Ashish Kapoor 🔗 |
-
|
SMART: Self-supervised Multi-task pretrAining with contRol Transformers
(
Poster
)
>
link
SlidesLive Video |
Yanchao Sun · shuang ma · Ratnesh Madaan · Rogerio Bonatti · Furong Huang · Ashish Kapoor 🔗 |
-
|
LATTE: LAnguage Trajectory TransformEr
(
Poster
)
>
link
SlidesLive Video |
A Bucker · Luis Figueredo · Sami Haddadin · Ashish Kapoor · shuang ma · Sai Vemprala · Rogerio Bonatti 🔗 |
-
|
Build generally reusable agent-environment interaction models
(
Poster
)
>
link
SlidesLive Video |
Jun Jin · Hongming Zhang · Jun Luo 🔗 |
-
|
Adapting Pretrained Vision-Language Foundational Models to Medical Imaging Domains
(
Poster
)
>
link
SlidesLive Video |
Pierre Chambon · Christian Bluethgen · Curtis Langlotz · Akshay Chaudhari 🔗 |
-
|
What Makes Certain Pre-Trained Visual Representations Better for Robotic Learning? ( Poster ) > link | Kyle Hsu · Tyler Lum · Ruohan Gao · Shixiang (Shane) Gu · Jiajun Wu · Chelsea Finn 🔗 |
-
|
Large Language Models Still Can't Plan (A Benchmark for LLMs on Planning and Reasoning about Change)
(
Poster
)
>
link
SlidesLive Video |
Karthik Valmeekam · Alberto Olmo · Sarath Sreedharan · Subbarao Kambhampati 🔗 |
-
|
A Control-Centric Benchmark for Video Prediction ( Poster ) > link | Stephen Tian · Chelsea Finn · Jiajun Wu 🔗 |
-
|
CabiNet: Scaling Neural Collision Detection for Object Rearrangement with Procedural Scene Generation
(
Poster
)
>
link
SlidesLive Video |
Adithyavairavan Murali · Arsalan Mousavian · Clemens Eppner · Adam Fishman · Dieter Fox 🔗 |
-
|
Planning With Large Language Models Via Corrective Re-Prompting
(
Poster
)
>
link
SlidesLive Video |
Shreyas Sundara Raman · Vanya Cohen · Eric Rosen · Ifrah Idrees · David Paulius · Stefanie Tellex 🔗 |
-
|
Decision Making as Language Generation ( Poster ) > link | Roland Memisevic · Sunny P Panchal · Mingu Lee 🔗 |
-
|
Multi-step Planning for Automated Hyperparameter Optimization with OptFormer
(
Poster
)
>
link
SlidesLive Video |
Lucio M Dery · Abram Friesen · Nando de Freitas · Marc'Aurelio Ranzato · Yutian Chen 🔗 |
-
|
A Mixture-of-Expert Approach to RL-based Dialogue Management
(
Poster
)
>
link
SlidesLive Video |
Yinlam Chow · Azamat Tulepbergenov · Ofir Nachum · Dhawal Gupta · Moonkyung Ryu · Mohammad Ghavamzadeh · Craig Boutilier 🔗 |
-
|
Foundation Models for Semantic Novelty in Reinforcement Learning
(
Poster
)
>
link
SlidesLive Video |
Tarun Gupta · Peter Karkus · Tong Che · Danfei Xu · Marco Pavone 🔗 |
-
|
Large Language Models Are Human-Level Prompt Engineers
(
Poster
)
>
link
SlidesLive Video |
Yongchao Zhou · Andrei Muresanu · Ziwen Han · Silviu Pitis · Harris Chan · Keiran Paster · Jimmy Ba 🔗 |
-
|
Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks ( Poster ) > link | Jesse Farebrother · Joshua Greaves · Rishabh Agarwal · Charline Le Lan · Ross Goroshin · Pablo Samuel Castro · Marc Bellemare 🔗 |
-
|
Return Augmentation gives Supervised RL Temporal Compositionality
(
Poster
)
>
link
SlidesLive Video |
Keiran Paster · Silviu Pitis · Sheila McIlraith · Jimmy Ba 🔗 |
-
|
Offline Q-learning on Diverse Multi-Task Data Both Scales And Generalizes ( Poster ) > link | Aviral Kumar · Rishabh Agarwal · XINYANG GENG · George Tucker · Sergey Levine 🔗 |
-
|
Pre-Training for Robots: Leveraging Diverse Multitask Data via Offline Reinforcement Learning ( Poster ) > link | Aviral Kumar · Anikait Singh · Frederik Ebert · Yanlai Yang · Chelsea Finn · Sergey Levine 🔗 |
-
|
Offline Reinforcement Learning from Heteroskedastic Data Via Support Constraints ( Poster ) > link | Anikait Singh · Aviral Kumar · Quan Vuong · Yevgen Chebotar · Sergey Levine 🔗 |
-
|
Planning with Large Language Models for Code Generation ( Poster ) > link | Shun Zhang · Zhenfang Chen · Yikang Shen · Mingyu Ding · Josh Tenenbaum · Chuang Gan 🔗 |
-
|
Learning Control by Iterative Inversion ( Poster ) > link | Gal Leibovich · Guy Jacob · Or Avner · Gal Novik · Aviv Tamar 🔗 |
-
|
Multi-Environment Pretraining Enables Transfer to Action Limited Datasets
(
Poster
)
>
link
SlidesLive Video |
David Venuto · Mengjiao (Sherry) Yang · Pieter Abbeel · Doina Precup · Igor Mordatch · Ofir Nachum 🔗 |
-
|
Active Acquisition for Multimodal Temporal Data: A Challenging Decision-Making Task
(
Poster
)
>
link
SlidesLive Video |
Jannik Kossen · Cătălina Cangea · Eszter Vértes · Andrew Jaegle · Viorica Patraucean · Ira Ktena · Nenad Tomasev · Danielle Belgrave 🔗 |
-
|
Foundation Models for History Compression in Reinforcement Learning
(
Poster
)
>
link
SlidesLive Video |
Fabian Paischer · Thomas Adler · Andreas Radler · Markus Hofmarcher · Sepp Hochreiter 🔗 |
-
|
Using Both Demonstrations and Language Instructions to Efficiently Learn Robotic Tasks
(
Poster
)
>
link
SlidesLive Video |
Albert Yu · Raymond Mooney 🔗 |
-
|
How crucial is Transformer in Decision Transformer?
(
Poster
)
>
link
SlidesLive Video |
Max Siebenborn · Boris Belousov · Junning Huang · Jan Peters 🔗 |
-
|
Pareto-Efficient Decision Agents for Offline Multi-Objective Reinforcement Learning ( Poster ) > link | Baiting Zhu · Meihua Dang · Aditya Grover 🔗 |
-
|
Is Conditional Generative Modeling all you need for Decision-Making? ( Poster ) > link | Anurag Ajay · Yilun Du · Abhi Gupta · Josh Tenenbaum · Tommi Jaakkola · Pulkit Agrawal 🔗 |
-
|
Wall Street Tree Search: Risk-Aware Planning for Offline Reinforcement Learning ( Poster ) > link | Dan Elbaz · Gal Novik · Oren Salzman 🔗 |
-
|
In-Context Policy Iteration ( Poster ) > link | Ethan Brooks · Logan Walls · Richard L Lewis · Satinder Singh 🔗 |
-
|
In-context Reinforcement Learning with Algorithm Distillation ( Poster ) > link |
14 presentersMichael Laskin · Luyu Wang · Junhyuk Oh · Emilio Parisotto · Stephen Spencer · Richie Steigerwald · DJ Strouse · Steven Hansen · Angelos Filos · Ethan Brooks · Maxime Gazeau · Himanshu Sahni · Satinder Singh · Volodymyr Mnih |
-
|
Contextual Transformer for Offline Meta Reinforcement Learning
(
Poster
)
>
link
SlidesLive Video |
Runji Lin · Ye Li · Xidong Feng · Zhaowei Zhang · XIAN HONG WU FUNG · Haifeng Zhang · Jun Wang · Yali Du · Yaodong Yang 🔗 |
-
|
Generative Pretraining for Black-Box Optimization
(
Poster
)
>
link
SlidesLive Video |
Siddarth Krishnamoorthy · Satvik Mashkaria · Aditya Grover 🔗 |
-
|
Understanding Hindsight Goal Relabeling Requires Rethinking Divergence Minimization
(
Poster
)
>
link
SlidesLive Video |
Lunjun Zhang · Bradly Stadie 🔗 |
-
|
REACT: Synergizing Reasoning and Acting in Language Models
(
Poster
)
>
link
SlidesLive Video |
Shunyu Yao · Jeffrey Zhao · Dian Yu · Izhak Shafran · Karthik Narasimhan · Yuan Cao 🔗 |
-
|
ConserWeightive Behavioral Cloning for Reliable Offline Reinforcement Learning
(
Poster
)
>
link
SlidesLive Video |
Tung Nguyen · Qinqing Zheng · Aditya Grover 🔗 |
-
|
Skill Acquisition by Instruction Augmentation on Offline Datasets
(
Poster
)
>
link
SlidesLive Video |
Ted Xiao · Harris Chan · Pierre Sermanet · Ayzaan Wahid · Anthony Brohan · Karol Hausman · Sergey Levine · Jonathan Tompson 🔗 |
-
|
On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning
(
Poster
)
>
link
SlidesLive Video |
yifan xu · Nicklas Hansen · Zirui Wang · Yung-Chieh Chan · Hao Su · Zhuowen Tu 🔗 |
-
|
CLaP: Conditional Latent Planners for Offline Reinforcement Learning
(
Poster
)
>
link
SlidesLive Video |
Harry Shin · Rose Wang 🔗 |
-
|
Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action
(
Poster
)
>
link
SlidesLive Video |
Dhruv Shah 🔗 |
-
|
Deep Transformer Q-Networks for Partially Observable Reinforcement Learning
(
Poster
)
>
link
SlidesLive Video |
Kevin Esslinger · Robert Platt · Christopher Amato 🔗 |
-
|
Control Graph as Unified IO for Morphology-Task Generalization
(
Poster
)
>
link
SlidesLive Video |
Hiroki Furuta · Yusuke Iwasawa · Yutaka Matsuo · Shixiang (Shane) Gu 🔗 |
-
|
Hyper-Decision Transformer for Efficient Online Policy Adaptation ( Poster ) > link | Mengdi Xu · Yuchen Lu · Yikang Shen · Shun Zhang · DING ZHAO · Chuang Gan 🔗 |
-
|
Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training
(
Poster
)
>
link
SlidesLive Video |
Jason Yecheng Ma · Shagun Sodhani · Dinesh Jayaraman · Osbert Bastani · Vikash Kumar · Amy Zhang 🔗 |
-
|
VIMA: General Robot Manipulation with Multimodal Prompts
(
Poster
)
>
link
SlidesLive Video |
Yunfan Jiang · Agrim Gupta · Zichen Zhang · Guanzhi Wang · Yongqiang Dou · Yanjun Chen · Fei-Fei Li · Anima Anandkumar · Yuke Zhu · Linxi Fan 🔗 |
-
|
Constrained MDPs can be Solved by Eearly-Termination with Recurrent Models
(
Poster
)
>
link
SlidesLive Video |
Hao Sun · Ziping Xu · Meng Fang · Zhenghao Peng · Taiyi Wang · Bolei Zhou 🔗 |
-
|
Supervised Q-Learning can be a Strong Baseline for Continuous Control
(
Poster
)
>
link
SlidesLive Video |
Hao Sun · Ziping Xu · Taiyi Wang · Meng Fang · Bolei Zhou 🔗 |
-
|
Solving PDDL Planning Problems with Pretrained Large Language Models
(
Poster
)
>
link
SlidesLive Video |
Tom Silver · Varun Hariprasad · Reece Shuttleworth · Nishanth Kumar · Tomás Lozano-Pérez · Leslie Kaelbling 🔗 |
-
|
Collaborating with language models for embodied reasoning
(
Poster
)
>
link
SlidesLive Video |
Ishita Dasgupta · Christine Kaeser-Chen · Kenneth Marino · Arun Ahuja · Sheila Babayan · Felix Hill · Rob Fergus 🔗 |
-
|
Elicitation Inference Optimization for Multi-Principal-Agent Alignment
(
Poster
)
>
link
SlidesLive Video |
Andrew Konya · Yeping L Qiu · Michael Varga · Aviv Ovadya 🔗 |
-
|
LMPriors: Pre-Trained Language Models as Task-Specific Priors
(
Poster
)
>
link
SlidesLive Video |
Kristy Choi · Chris Cundy · Sanjari Srivastava · Stefano Ermon 🔗 |