Datasets and Benchmarks
Dataset and Benchmark Poster Session 3
Joaquin Vanschoren · Serena Yeung
Moderator : Alice Oh
The Datasets and Benchmarks track serves as a novel venue for high-quality publications, talks, and posters on highly valuable machine learning datasets and benchmarks, as well as a forum for discussions on how to improve dataset development. Datasets and benchmarks are crucial for the development of machine learning methods, but also require their own publishing and reviewing guidelines. For instance, datasets can often not be reviewed in a double-blind fashion, and hence full anonymization will not be required. On the other hand, they do require additional specific checks, such as a proper description of how the data was collected, whether they show intrinsic bias, and whether they will remain accessible.
Schedule
-
|
Q-Pain: A Question Answering Dataset to Measure Social Bias in Pain Management
(
Poster
)
>
SlidesLive Video |
Cécile Logé · Emily Ross · David Dadey · Saahil Jain · Adriel Saporta · Andrew Ng · Pranav Rajpurkar 🔗 |
-
|
Modeling Worlds in Text
(
Poster
)
>
link
SlidesLive Video |
Prithviraj Ammanabrolu · Mark Riedl 🔗 |
-
|
OmniPrint: A Configurable Printed Character Synthesizer
(
Poster
)
>
SlidesLive Video |
Haozhe Sun · Wei-Wei Tu · Isabelle Guyon 🔗 |
-
|
Benchmarking Bias Mitigation Algorithms in Representation Learning through Fairness Metrics
(
Poster
)
>
SlidesLive Video |
Charan Reddy · Deepak Sharma · Soroush Mehri · Adriana Romero Soriano · Samira Shabanian · Sina Honari 🔗 |
-
|
An Extensible Benchmark Suite for Learning to Simulate Physical Systems
(
Poster
)
>
link
SlidesLive Video |
Karl Otness · Arvi Gjoka · Joan Bruna · Daniele Panozzo · Benjamin Peherstorfer · Teseo Schneider · Denis Zorin 🔗 |
-
|
The Multi-Agent Behavior Dataset: Mouse Dyadic Social Interactions
(
Poster
)
>
SlidesLive Video |
11 presentersJennifer J Sun · Tomomi Karigo · Dipam Chakraborty · Sharada Mohanty · Benjamin Wild · Quan Sun · Chen Chen · David Anderson · Pietro Perona · Yisong Yue · Ann Kennedy |
-
|
Reinforcement Learning Benchmarks for Traffic Signal Control
(
Poster
)
>
SlidesLive Video |
James Ault · Guni Sharon 🔗 |
-
|
MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research
(
Poster
)
>
SlidesLive Video |
Mikayel Samvelyan · Robert Kirk · Vitaly Kurin · Jack Parker-Holder · Minqi Jiang · Eric Hambro · Fabio Petroni · Heinrich Kuttler · Edward Grefenstette · Tim Rocktäschel 🔗 |
-
|
Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks
(
Poster
)
>
SlidesLive Video |
Georgios Papoudakis · Filippos Christianos · Lukas Schäfer · Stefano Albrecht 🔗 |
-
|
Which priors matter? Benchmarking models for learning latent dynamics
(
Poster
)
>
SlidesLive Video |
Aleksandar Botev · Andrew Jaegle · Peter Wirnsberger · Daniel Hennes · Irina Higgins 🔗 |
-
|
The Neural MMO Platform for Massively Multiagent Research
(
Poster
)
>
SlidesLive Video |
Joseph Suarez · Yilun Du · Clare Zhu · Igor Mordatch · Phillip Isola 🔗 |
-
|
A Procedural World Generation Framework for Systematic Evaluation of Continual Learning
(
Poster
)
>
SlidesLive Video |
Timm Hess · Martin Mundt · Iuliia Pliushch · Visvanathan Ramesh 🔗 |
-
|
Brax - A Differentiable Physics Engine for Large Scale Rigid Body Simulation
(
Poster
)
>
SlidesLive Video |
Daniel Freeman · Erik Frey · Anton Raichuk · Sertan Girgin · Igor Mordatch · Olivier Bachem 🔗 |
-
|
CCNLab: A Benchmarking Framework for Computational Cognitive Neuroscience
(
Poster
)
>
SlidesLive Video |
Nikhil Bhattasali · Momchil Tomov · Samuel J Gershman 🔗 |
-
|
Addressing "Documentation Debt" in Machine Learning: A Retrospective Datasheet for BookCorpus
(
Poster
)
>
SlidesLive Video |
John Bandy · Nicholas Vincent 🔗 |
-
|
Generating Datasets of 3D Garments with Sewing Patterns
(
Poster
)
>
SlidesLive Video |
Maria Korosteleva · Sung-Hee Lee 🔗 |
-
|
Teach Me to Explain: A Review of Datasets for Explainable Natural Language Processing
(
Poster
)
>
SlidesLive Video |
Sarah Wiegreffe · Ana Marasovic 🔗 |
-
|
B-Pref: Benchmarking Preference-Based Reinforcement Learning
(
Poster
)
>
SlidesLive Video |
Kimin Lee · Laura Smith · Anca Dragan · Pieter Abbeel 🔗 |
-
|
Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks
(
Poster
)
>
link
SlidesLive Video |
Curtis Northcutt · Anish Athalye · Jonas Mueller 🔗 |
-
|
CommonsenseQA 2.0: Exposing the Limits of AI through Gamification
(
Poster
)
>
SlidesLive Video |
Alon Talmor · Ori Yoran · Ronan Le Bras · Chandra Bhagavatula · Yoav Goldberg · Yejin Choi · Jonathan Berant 🔗 |
-
|
Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning ( Poster ) > link | Cameron Voloshin · Hoang Le · Nan Jiang · Yisong Yue 🔗 |
-
|
ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation
(
Poster
)
>
link
SlidesLive Video |
24 presentersChuang Gan · Jeremy Schwartz · Seth Alter · Damian Mrowca · Martin Schrimpf · James Traer · Julian De Freitas · Jonas Kubilius · Abhishek Bhandwaldar · Nick Haber · Megumi Sano · Kuno Kim · Elias Wang · Michael Lingelbach · Aidan Curtis · Kevin Feigelis · Daniel Bear · Dan Gutfreund · David Cox · Antonio Torralba · James J DiCarlo · Josh Tenenbaum · Josh McDermott · Dan Yamins |
-
|
Physion: Evaluating Physical Prediction from Vision in Humans and Machines
(
Poster
)
>
link
SlidesLive Video |
15 presentersDaniel Bear · Elias Wang · Damian Mrowca · Felix Binder · Hsiao-Yu Tung · Pramod RT · Cameron Holdaway · Sirui Tao · Kevin Smith · Fan-Yun Sun · Fei-Fei Li · Nancy Kanwisher · Josh Tenenbaum · Dan Yamins · Judith Fan |
-
|
CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms
(
Poster
)
>
SlidesLive Video |
Martin Pawelczyk · Sascha Bielawski · Johan Van den Heuvel · Tobias Richter · Gjergji Kasneci 🔗 |
-
|
It's COMPASlicated: The Messy Relationship between RAI Datasets and Algorithmic Fairness Benchmarks
(
Poster
)
>
link
SlidesLive Video |
Michelle Bao · Angela Zhou · Samantha Zottola · Brian Brubach · Sarah Desmarais · Aaron Horowitz · Kristian Lum · Suresh Venkatasubramanian 🔗 |
-
|
Automatic Construction of Evaluation Suites for Natural Language Generation Datasets
(
Poster
)
>
SlidesLive Video |
Simon Mille · Kaustubh Dhole · Saad Mahamood · Laura Perez-Beltrachini · Varun Prashant Gangal · Mihir Kale · Emiel van Miltenburg · Sebastian Gehrmann 🔗 |
-
|
Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research
(
Poster
)
>
link
SlidesLive Video |
Bernard Koch · Emily Denton · Alex Hanna · Jacob G Foster 🔗 |
-
|
Dynamic Environments with Deformable Objects
(
Poster
)
>
SlidesLive Video |
Rika Antonova · peiyang shi · Hang Yin · Zehang Weng · Danica Kragic 🔗 |
-
|
An Empirical Investigation of Representation Learning for Imitation
(
Poster
)
>
link
SlidesLive Video |
12 presentersCynthia Chen · Sam Toyer · Cody Wild · Scott Emmons · Ian Fischer · Kuang-Huei Lee · Neel Alex · Steven Wang · Ping Luo · Stuart Russell · Pieter Abbeel · Rohin Shah |
-
|
OpenML Benchmarking Suites
(
Poster
)
>
link
SlidesLive Video |
Bernd Bischl · Giuseppe Casalicchio · Matthias Feurer · Pieter Gijsbers · Frank Hutter · Michel Lang · Rafael Gomes Mantovani · Jan van Rijn · Joaquin Vanschoren 🔗 |
-
|
Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning
(
Poster
)
>
|
Nan Rosemary Ke · Aniket Didolkar · Sarthak Mittal · Anirudh Goyal · Guillaume Lajoie · Stefan Bauer · Danilo Jimenez Rezende · Yoshua Bengio · Chris Pal · Michael Mozer 🔗 |
-
|
RB2: Robotic Manipulation Benchmarking with a Twist
(
Poster
)
>
link
SlidesLive Video |
15 presentersSudeep Dasari · Jianren Wang · Joyce Hong · Shikhar Bahl · Yixin Lin · Austin Wang · Abitha Thankaraj · Karanbir Chahal · Berk Calli · Saurabh Gupta · David Held · Lerrel Pinto · Deepak Pathak · Vikash Kumar · Abhinav Gupta |
-
|
Really Doing Great at Estimating CATE? A Critical Look at ML Benchmarking Practices in Treatment Effect Estimation
(
Poster
)
>
SlidesLive Video |
Alicia Curth · David Svensson · Jim Weatherall · Mihaela van der Schaar 🔗 |
-
|
Chest ImaGenome Dataset for Clinical Reasoning
(
Poster
)
>
link
SlidesLive Video |
12 presentersJoy T Wu · Nkechinyere Agu · Ismini Lourentzou · Arjun Sharma · Joseph Alexander Paguio · Jasper Seth Yao · Edward C Dee · William Mitchell · Satyananda Kashyap · Andrea Giovannini · Leo Anthony Celi · Mehdi Moradi |
-
|
Mitigating dataset harms requires stewardship: Lessons from 1000 papers
(
Poster
)
>
link
SlidesLive Video |
Kenneth Peng · Arunesh Mathur · Arvind Narayanan 🔗 |
-
|
Artsheets for Art Datasets
(
Poster
)
>
SlidesLive Video |
Ramya Srinivasan · Emily Denton · Jordan Famularo · Negar Rostamzadeh · Fernando Diaz · Beth Coleman 🔗 |
-
|
An Empirical Study of Graph Contrastive Learning
(
Poster
)
>
link
SlidesLive Video |
Yanqiao Zhu · Yichen Xu · Qiang Liu · Shu Wu 🔗 |
-
|
Monash Time Series Forecasting Archive
(
Poster
)
>
SlidesLive Video |
Rakshitha W Godahewa · Christoph Bergmeir · Geoffrey Webb · Rob Hyndman · Pablo Montero-Manso 🔗 |
-
|
Synthetic Benchmarks for Scientific Research in Explainable Machine Learning
(
Poster
)
>
SlidesLive Video |
Yang Liu · Sujay Khandagale · Colin White · Willie Neiswanger 🔗 |
-
|
A Toolbox for Construction and Analysis of Speech Datasets
(
Poster
)
>
SlidesLive Video |
Evelina Bakhturina · Vitaly Lavrukhin · Boris Ginsburg 🔗 |
-
|
Evaluating Bayes Error Estimators on Real-World Datasets with FeeBee
(
Poster
)
>
SlidesLive Video |
Cedric Renggli · Luka Rimanic · Nora Hollenstein · Ce Zhang 🔗 |
-
|
Alchemy: A benchmark and analysis toolkit for meta-reinforcement learning agents
(
Poster
)
>
SlidesLive Video |
17 presentersJane Wang · Michael King · Nicolas Porcel · Zeb Kurth-Nelson · Tina Zhu · Charles Deck · Peter Choy · Mary Cassin · Malcolm Reynolds · Francis Song · Gavin Buttimore · David Reichert · Neil Rabinowitz · Loic Matthey · Demis Hassabis · Alexander Lerchner · Matt Botvinick |
-
|
FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark
(
Poster
)
>
SlidesLive Video |
16 presentersMingjie Li · Wenjia Cai · Rui Liu · Yuetian Weng · Xiaoyun Zhao · Cong Wang · Xin Chen · Zhong Liu · Caineng Pan · Mengke Li · yingfeng zheng · Yizhi Liu · Flora Salim · Karin Verspoor · Xiaodan Liang · Xiaojun Chang |
-
|
An Information Retrieval Approach to Building Datasets for Hate Speech Detection
(
Poster
)
>
link
SlidesLive Video |
Md Mustafizur Rahman · Dinesh Balakrishnan · Dhiraj Murthy · Mucahid Kutlu · Matt Lease 🔗 |
-
|
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation
(
Poster
)
>
SlidesLive Video |
Yuta Saito · Shunsuke Aihara · Megumi Matsutani · Yusuke Narita 🔗 |
-
|
ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale Demonstrations
(
Poster
)
>
SlidesLive Video |
Tongzhou Mu · Zhan Ling · Fanbo Xiang · Derek Yang · Xuanlin Li · Stone Tao · Zhiao Huang · Zhiwei Jia · Hao Su 🔗 |
-
|
AI and the Everything in the Whole Wide World Benchmark
(
Poster
)
>
link
SlidesLive Video |
Deborah Raji · Emily Denton · Emily M. Bender · Alex Hanna · Amandalynne Paullada 🔗 |
-
|
Are We Learning Yet? A Meta Review of Evaluation Failures Across Machine Learning
(
Poster
)
>
SlidesLive Video |
Thomas Liao · Rohan Taori · Deborah Raji · Ludwig Schmidt 🔗 |
-
|
Isaac Gym: High Performance GPU Based Physics Simulation For Robot Learning
(
Poster
)
>
SlidesLive Video |
11 presentersViktor Makoviychuk · Lukasz Wawrzyniak · Yunrong Guo · Michelle Lu · Kier Storey · Miles Macklin · David Hoeller · Nikita Rudin · Arthur Allshire · Ankur Handa · Gavriel State |
-
|
Hardware Design and Accurate Simulation of Structured-Light Scanning for Benchmarking of 3D Reconstruction Algorithms
(
Poster
)
>
link
SlidesLive Video |
Sebastian Koch · Yurii Piadyk · Markus Worchel · Marc Alexa · Claudio Silva · Denis Zorin · Daniele Panozzo 🔗 |
-
|
The Medkit-Learn(ing) Environment: Medical Decision Modelling through Simulation
(
Poster
)
>
SlidesLive Video |
Alex Chan · Ioana Bica · Alihan Hüyük · Daniel Jarrett · Mihaela van der Schaar 🔗 |
-
|
URLB: Unsupervised Reinforcement Learning Benchmark
(
Poster
)
>
SlidesLive Video |
Misha Laskin · Denis Yarats · Hao Liu · Kimin Lee · Albert Zhan · Kevin Lu · Catherine Cang · Lerrel Pinto · Pieter Abbeel 🔗 |
-
|
What Would Jiminy Cricket Do? Towards Agents That Behave Morally
(
Poster
)
>
SlidesLive Video |
Dan Hendrycks · Mantas Mazeika · Andy Zou · Sahil Patel · Christine Zhu · Jesus Navarro · Dawn Song · Bo Li · Jacob Steinhardt 🔗 |