Workshop
Offline Reinforcement Learning
Rishabh Agarwal · Aviral Kumar · George Tucker · Justin Fu · Nan Jiang · Doina Precup · Aviral Kumar
Tue 14 Dec, 9 a.m. PST
Offline reinforcement learning (RL) is a re-emerging area of study that aims to learn behaviors using only logged data, such as data from previous experiments or human demonstrations, without further environment interaction. It has the potential to make tremendous progress in a number of real-world decision-making problems where active data collection is expensive (e.g., in robotics, drug discovery, dialogue generation, recommendation systems) or unsafe/dangerous (e.g., healthcare, autonomous driving, or education). Such a paradigm promises to resolve a key challenge to bringing reinforcement learning algorithms out of constrained lab settings to the real world. The first edition of the offline RL workshop, held at NeurIPS 2020, focused on and led to algorithmic development in offline RL. This year we propose to shift the focus from algorithm design to bridging the gap between offline RL research and real-world offline RL. Our aim is to create a space for discussion between researchers and practitioners on topics of importance for enabling offline RL methods in the real world. To that end, we have revised the topics and themes of the workshop, invited new speakers working on application-focused areas, and building on the lively panel discussion last year, we have invited the panelists from last year to participate in a retrospective panel on their changing perspectives.
For details on submission please visit: https://offline-rl-neurips.github.io/2021 (Submission deadline: October 6, Anywhere on Earth)
Speakers:
Aviv Tamar (Technion - Israel Inst. of Technology)
Angela Schoellig (University of Toronto)
Barbara Engelhardt (Princeton University)
Sham Kakade (University of Washington/Microsoft)
Minmin Chen (Google)
Philip S. Thomas (UMass Amherst)
Schedule
Tue 9:00 a.m. - 9:10 a.m.
|
Opening Remarks
(
Opening Remarks
)
>
SlidesLive Video |
Rishabh Agarwal · Aviral Kumar 🔗 |
Tue 9:10 a.m. - 9:40 a.m.
|
Learning to Explore From Data
(
Talk
)
>
SlidesLive Video |
Aviv Tamar 🔗 |
Tue 9:40 a.m. - 9:45 a.m.
|
Q&A for Aviv Tamar
(
Q&A
)
>
|
Aviv Tamar 🔗 |
Tue 9:45 a.m. - 9:55 a.m.
|
Contributed Talk 1: What Matters in Learning from Offline Human Demonstrations for Robot Manipulation
(
Talk
)
>
SlidesLive Video |
Ajay Mandlekar 🔗 |
Tue 10:00 a.m. - 10:10 a.m.
|
Contributed Talk 2: What Would the Expert do?: Causal Imitation Learning
(
Talk
)
>
SlidesLive Video |
Gokul Swamy 🔗 |
Tue 10:15 a.m. - 10:25 a.m.
|
Contributed Talk 3: Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation
(
Talk
)
>
SlidesLive Video |
Yunzong Xu · Akshay Krishnamurthy · David Simchi-Levi 🔗 |
Tue 10:30 a.m. - 10:40 a.m.
|
Contributed Talk 4: PulseRL: Enabling Offline Reinforcement Learning for Digital Marketing Systems via Conservative Q-Learning
(
Talk
)
>
SlidesLive Video |
Luckeciano Carvalho Melo 🔗 |
Tue 10:40 a.m. - 11:45 a.m.
|
Poster Session 1
(
Poster Session
)
>
|
🔗 |
Tue 11:45 a.m. - 11:46 a.m.
|
Speaker Intro
(
Speaker Introduction
)
>
|
Rishabh Agarwal · Aviral Kumar 🔗 |
Tue 11:46 a.m. - 12:16 p.m.
|
Offline RL for Robotics
(
Talk
)
>
SlidesLive Video |
Angela Schoellig 🔗 |
Tue 12:16 p.m. - 12:21 p.m.
|
Q&A for Angela Schoellig
(
Q&A
)
>
|
🔗 |
Tue 12:21 p.m. - 12:22 p.m.
|
Speaker Intro
(
Live short intro
)
>
|
Rishabh Agarwal · Aviral Kumar 🔗 |
Tue 12:22 p.m. - 12:52 p.m.
|
Generalization theory in Offline RL
(
Talk
)
>
SlidesLive Video |
Sham Kakade 🔗 |
Tue 12:52 p.m. - 12:57 p.m.
|
Q&A for Sham Kakade
(
Q&A
)
>
|
Sham Kakade 🔗 |
Tue 1:00 p.m. - 2:00 p.m.
|
Invited Speaker Panel
(
Discussion Panel
)
>
SlidesLive Video |
Sham Kakade · Minmin Chen · Philip Thomas · Angela Schoellig · Barbara Engelhardt · Doina Precup · George Tucker 🔗 |
Tue 2:00 p.m. - 3:00 p.m.
|
Retrospective Panel
(
Discussion Panel
)
>
SlidesLive Video |
Sergey Levine · Nando de Freitas · Emma Brunskill · Finale Doshi-Velez · Nan Jiang · Rishabh Agarwal 🔗 |
Tue 3:00 p.m. - 3:01 p.m.
|
Speaker Intro
(
Speaker Intro
)
>
|
Aviral Kumar · George Tucker 🔗 |
Tue 3:01 p.m. - 3:31 p.m.
|
Offline RL for recommendation systems
(
Talk
)
>
SlidesLive Video |
Minmin Chen 🔗 |
Tue 3:31 p.m. - 3:36 p.m.
|
Q&A for Minmin Chen
(
Q&A
)
>
|
Minmin Chen 🔗 |
Tue 4:06 p.m. - 4:07 p.m.
|
Speaker Intro
(
Speaker Intro
)
>
|
Aviral Kumar · George Tucker 🔗 |
Tue 4:07 p.m. - 4:37 p.m.
|
Offline Reinforcement Learning for Hospital Patients When Every Patient is Different
(
Talk
)
>
SlidesLive Video |
Barbara Engelhardt 🔗 |
Tue 4:37 p.m. - 4:42 p.m.
|
Q&A for Barbara Engelhardt
(
Q&A
)
>
|
🔗 |
Tue 4:42 p.m. - 4:43 p.m.
|
Speaker Intro
(
Introduction
)
>
|
🔗 |
Tue 4:43 p.m. - 5:13 p.m.
|
Advances in (High-Confidence) Off-Policy Evaluation
(
Talk
)
>
SlidesLive Video |
Philip Thomas 🔗 |
Tue 5:13 p.m. - 5:19 p.m.
|
Q&A for Philip Thomas
(
Q&A
)
>
|
Philip Thomas 🔗 |
Tue 5:19 p.m. - 5:20 p.m.
|
Closing Remarks & Poster Session
(
Closing Remarks
)
>
|
🔗 |
Tue 5:20 p.m. - 6:20 p.m.
|
Poster Session 2
(
Poster Session
)
>
|
🔗 |
-
|
Offline Reinforcement Learning with Soft Behavior Regularization
(
Poster
)
>
|
Haoran Xu · Xianyuan Zhan · Li Jianxiong · Honglei Yin 🔗 |
-
|
Instance-dependent Offline Reinforcement Learning: From tabular RL to linear MDPs
(
Poster
)
>
|
Ming Yin · Yu-Xiang Wang 🔗 |
-
|
DCUR: Data Curriculum for Teaching via Samples with Reinforcement Learning
(
Poster
)
>
|
Daniel Seita · Abhinav Gopal · Mandi Zhao · John Canny 🔗 |
-
|
What Matters in Learning from Offline Human Demonstrations for Robot Manipulation
(
Poster
)
>
|
Ajay Mandlekar · Danfei Xu · Josiah Wong · Chen Wang · Li Fei-Fei · Silvio Savarese · Yuke Zhu · Roberto Martín-Martín 🔗 |
-
|
TiKick: Toward Playing Multi-agent Football Full Games from Single-agent Demonstrations
(
Poster
)
>
|
Shiyu Huang · Wenze Chen · Longfei Zhang · Shizhen Xu · Ziyang Li · Fengming Zhu · Deheng Ye · Ting Chen · Jun Zhu 🔗 |
-
|
d3rlpy: An Offline Deep Reinforcement Learning Library
(
Poster
)
>
|
Takuma Seno · Michita Imai 🔗 |
-
|
PulseRL: Enabling Offline Reinforcement Learning for Digital Marketing Systems via Conservative Q-Learning
(
Poster
)
>
|
Luckeciano Carvalho Melo · Luana G B Martins · Bryan Lincoln de Oliveira · Bruno Brandão · Douglas Winston Soares · Telma Lima 🔗 |
-
|
Latent Geodesics of Model Dynamics for Offline Reinforcement Learning
(
Poster
)
>
|
Guy Tennenholtz · Nir Baram · Shie Mannor 🔗 |
-
|
Domain Knowledge Guided Offline Q Learning
(
Poster
)
>
|
Xiaoxuan Zhang · Sijia Zhang · Yen-Yun Yu 🔗 |
-
|
Understanding the Effects of Dataset Characteristics on Offline Reinforcement Learning
(
Poster
)
>
|
Kajetan Schweighofer · Markus Hofmarcher · Marius-Constantin Dinu · Philipp Renz · Angela Bitto · Vihang Patil · Sepp Hochreiter 🔗 |
-
|
Unsupervised Learning of Temporal Abstractions using Slot-based Transformers
(
Poster
)
>
|
Anand Gopalakrishnan · Kazuki Irie · Jürgen Schmidhuber · Sjoerd van Steenkiste 🔗 |
-
|
Counter-Strike Deathmatch with Large-Scale Behavioural Cloning
(
Poster
)
>
|
Tim Pearce · Jun Zhu 🔗 |
-
|
Modern Hopfield Networks for Return Decomposition for Delayed Rewards
(
Poster
)
>
|
Michael Widrich · Markus Hofmarcher · Vihang Patil · Angela Bitto · Sepp Hochreiter 🔗 |
-
|
Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage
(
Poster
)
>
|
Masatoshi Uehara · Wen Sun 🔗 |
-
|
Importance of Representation Learning for Off-Policy Fitted Q-Evaluation
(
Poster
)
>
|
Xian Wu · Nevena Lazic · Dong Yin · Cosmin Paduraru 🔗 |
-
|
Offline Contextual Bandits for Wireless Network Optimization
(
Poster
)
>
|
Miguel Suau 🔗 |
-
|
Robust On-Policy Data Collection for Data-Efficient Policy Evaluation
(
Poster
)
>
|
Rujie Zhong · Josiah Hanna · Lukas Schäfer · Stefano Albrecht 🔗 |
-
|
Doubly Pessimistic Algorithms for Strictly Safe Off-Policy Optimization
(
Poster
)
>
|
Sanae Amani · Lin Yang 🔗 |
-
|
OFFLINE RL WITH RESOURCE CONSTRAINED ONLINE DEPLOYMENT
(
Poster
)
>
|
Jayanth Reddy Regatti · Aniket Anand Deshmukh · Young Jung · Abhishek Gupta · Urun Dogan 🔗 |
-
|
Personalization for Web-based Services using Offline Reinforcement Learning
(
Poster
)
>
|
Pavlos A Apostolopoulos · Zehui Wang · Hanson Wang · Chad Zhou · Kittipat Virochsiri · Norm Zhou · Igor Markov 🔗 |
-
|
Offline Reinforcement Learning with Implicit Q-Learning
(
Poster
)
>
|
Ilya Kostrikov · Ashvin Nair · Sergey Levine 🔗 |
-
|
Pessimistic Model Selection for Offline Deep Reinforcement Learning
(
Poster
)
>
|
Huck Yang · Yifan Cui · Pin-Yu Chen 🔗 |
-
|
BATS: Best Action Trajectory Stitching
(
Poster
)
>
|
Ian Char · Viraj Mehta · Adam Villaflor · John Dolan · Jeff Schneider 🔗 |
-
|
Single-Shot Pruning for Offline Reinforcement Learning
(
Poster
)
>
|
Samin Yeasar Arnob · · Sergey Plis · Doina Precup 🔗 |
-
|
Offline neural contextual bandits: Pessimism, Optimization and Generalization
(
Poster
)
>
|
Thanh Nguyen-Tang · Sunil Gupta · A. Tuan Nguyen · Svetha Venkatesh 🔗 |
-
|
Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions
(
Poster
)
>
|
Bogdan Mazoure · Ilya Kostrikov · Ofir Nachum · Jonathan Tompson 🔗 |
-
|
Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning
(
Poster
)
>
|
Yi Zhao · Rinu Boney · Alexander Ilin · Juho Kannala · Joni Pajarinen 🔗 |
-
|
What Would the Expert $do(\cdot)$?: Causal Imitation Learning
(
Poster
)
>
|
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu 🔗 |
-
|
Quantile Filtered Imitation Learning
(
Poster
)
>
|
David Brandfonbrener · Will Whitney · Rajesh Ranganath · Joan Bruna 🔗 |
-
|
Benchmarking Sample Selection Strategies for Batch Reinforcement Learning
(
Poster
)
>
|
Yuwei Fu · Di Wu · Benoit Boulet 🔗 |
-
|
Dynamic Mirror Descent based Model Predictive Control for Accelerating Robot Learning
(
Poster
)
>
|
Utkarsh A Mishra · Soumya Samineni · Aditya Varma Sagi · Shalabh Bhatnagar · Shishir N Y 🔗 |
-
|
MBAIL: Multi-Batch Best Action Imitation Learning utilizing Sample Transfer and Policy Distillation
(
Poster
)
>
|
Di Wu · · David Meger · Michael Jenkin · Steve Liu · Gregory Dudek 🔗 |
-
|
Showing Your Offline Reinforcement Learning Work: Online Evaluation Budget Matters
(
Poster
)
>
|
Vladislav Kurenkov · Sergey Kolesnikov 🔗 |
-
|
Offline Reinforcement Learning with Munchausen Regularization
(
Poster
)
>
|
Hsin-Yu Liu · Bharathan Balaji · Dezhi Hong 🔗 |
-
|
Importance of Empirical Sample Complexity Analysis for Offline Reinforcement Learning
(
Poster
)
>
|
Samin Yeasar Arnob · Riashat Islam · Doina Precup 🔗 |
-
|
Discrete Uncertainty Quantification Approach for Offline RL
(
Poster
)
>
|
Javier Corrochano · Rubén Majadas · FERNANDO FERNANDEZ 🔗 |
-
|
Pretraining for Language-Conditioned Imitation with Transformers
(
Poster
)
>
|
Aaron Putterman · Kevin Lu · Igor Mordatch · Pieter Abbeel 🔗 |
-
|
Stateful Offline Contextual Policy Evaluation and Learning
(
Poster
)
>
|
Angela Zhou 🔗 |
-
|
Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation
(
Poster
)
>
|
Dylan Foster · Akshay Krishnamurthy · David Simchi-Levi · Yunzong Xu 🔗 |
-
|
Learning Value Functions from Undirected State-only Experience
(
Poster
)
>
|
Matthew Chang · Arjun Gupta · Saurabh Gupta 🔗 |
-
|
Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations
(
Poster
)
>
|
Haoran Xu · Xianyuan Zhan · Honglei Yin · 🔗 |
-
|
Model-Based Offline Planning with Trajectory Pruning
(
Poster
)
>
|
Xianyuan Zhan · Xiangyu Zhu · Haoran Xu 🔗 |
-
|
TRAIL: Near-Optimal Imitation Learning with Suboptimal Data
(
Poster
)
>
|
Mengjiao (Sherry) Yang · Sergey Levine · Ofir Nachum 🔗 |
-
|
Offline Meta-Reinforcement Learning for Industrial Insertion
(
Poster
)
>
|
Tony Zhao · Jianlan Luo · Oleg Sushkov · Rugile Pevceviciute · Nicolas Heess · Jonathan Scholz · Stefan Schaal · Sergey Levine 🔗 |
-
|
Sim-to-Real Interactive Recommendation via Off-Dynamics Reinforcement Learning
(
Poster
)
>
|
Junda Wu · Zhihui Xie · Tong Yu · Qizhi Li · Shuai Li 🔗 |
-
|
Why so pessimistic? Estimating uncertainties for offline rl through ensembles, and why their independence matters
(
Poster
)
>
|
Kamyar Ghasemipour · Shixiang (Shane) Gu · Ofir Nachum 🔗 |
-
|
Example-Based Offline Reinforcement Learning without Rewards
(
Poster
)
>
|
Kyle Hatch · Tianhe Yu · Rafael Rafailov · Chelsea Finn 🔗 |
-
|
The Reflective Explorer: Online Meta-Exploration from Offline Data in Realistic Robotic Tasks
(
Poster
)
>
|
Rafael Rafailov · · Tianhe Yu · Avi Singh · Mariano Phielipp · Chelsea Finn 🔗 |