NeurIPS Datasets and Benchmarks Dataset and Benchmark Poster Session 3

Datasets and Benchmarks

Dataset and Benchmark Poster Session 3

Joaquin Vanschoren · Serena Yeung

Moderator : Alice Oh

[ Abstract ]

Abstract:

The Datasets and Benchmarks track serves as a novel venue for high-quality publications, talks, and posters on highly valuable machine learning datasets and benchmarks, as well as a forum for discussions on how to improve dataset development. Datasets and benchmarks are crucial for the development of machine learning methods, but also require their own publishing and reviewing guidelines. For instance, datasets can often not be reviewed in a double-blind fashion, and hence full anonymization will not be required. On the other hand, they do require additional specific checks, such as a proper description of how the data was collected, whether they show intrinsic bias, and whether they will remain accessible.

Schedule

-	Q-Pain: A Question Answering Dataset to Measure Social Bias in Pain Management ( Poster ) > SlidesLive Video	Cécile Logé · Emily Ross · David Dadey · Saahil Jain · Adriel Saporta · Andrew Ng · Pranav Rajpurkar 🔗
-	Modeling Worlds in Text ( Poster ) > link SlidesLive Video Link	Prithviraj Ammanabrolu · Mark Riedl 🔗
-	OmniPrint: A Configurable Printed Character Synthesizer ( Poster ) > SlidesLive Video	Haozhe Sun · Wei-Wei Tu · Isabelle Guyon 🔗
-	Benchmarking Bias Mitigation Algorithms in Representation Learning through Fairness Metrics ( Poster ) > SlidesLive Video	Charan Reddy · Deepak Sharma · Soroush Mehri · Adriana Romero Soriano · Samira Shabanian · Sina Honari 🔗
-	An Extensible Benchmark Suite for Learning to Simulate Physical Systems ( Poster ) > link SlidesLive Video Link	Karl Otness · Arvi Gjoka · Joan Bruna · Daniele Panozzo · Benjamin Peherstorfer · Teseo Schneider · Denis Zorin 🔗
-	The Multi-Agent Behavior Dataset: Mouse Dyadic Social Interactions ( Poster ) > SlidesLive Video	11 presenters Jennifer J Sun · Tomomi Karigo · Dipam Chakraborty · Sharada Mohanty · Benjamin Wild · Quan Sun · Chen Chen · David Anderson · Pietro Perona · Yisong Yue · Ann Kennedy 🔗
-	Reinforcement Learning Benchmarks for Traffic Signal Control ( Poster ) > SlidesLive Video	James Ault · Guni Sharon 🔗
-	MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research ( Poster ) > SlidesLive Video	Mikayel Samvelyan · Robert Kirk · Vitaly Kurin · Jack Parker-Holder · Minqi Jiang · Eric Hambro · Fabio Petroni · Heinrich Kuttler · Edward Grefenstette · Tim Rocktäschel 🔗
-	Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks ( Poster ) > SlidesLive Video	Georgios Papoudakis · Filippos Christianos · Lukas Schäfer · Stefano Albrecht 🔗
-	Which priors matter? Benchmarking models for learning latent dynamics ( Poster ) > SlidesLive Video	Aleksandar Botev · Andrew Jaegle · Peter Wirnsberger · Daniel Hennes · Irina Higgins 🔗
-	The Neural MMO Platform for Massively Multiagent Research ( Poster ) > SlidesLive Video	Joseph Suarez · Yilun Du · Clare Zhu · Igor Mordatch · Phillip Isola 🔗
-	A Procedural World Generation Framework for Systematic Evaluation of Continual Learning ( Poster ) > SlidesLive Video	Timm Hess · Martin Mundt · Iuliia Pliushch · Visvanathan Ramesh 🔗
-	Brax - A Differentiable Physics Engine for Large Scale Rigid Body Simulation ( Poster ) > SlidesLive Video	Daniel Freeman · Erik Frey · Anton Raichuk · Sertan Girgin · Igor Mordatch · Olivier Bachem 🔗
-	CCNLab: A Benchmarking Framework for Computational Cognitive Neuroscience ( Poster ) > SlidesLive Video	Nikhil Bhattasali · Momchil Tomov · Samuel J Gershman 🔗
-	Addressing "Documentation Debt" in Machine Learning: A Retrospective Datasheet for BookCorpus ( Poster ) > SlidesLive Video	John Bandy · Nicholas Vincent 🔗
-	Generating Datasets of 3D Garments with Sewing Patterns ( Poster ) > SlidesLive Video	Maria Korosteleva · Sung-Hee Lee 🔗
-	Teach Me to Explain: A Review of Datasets for Explainable Natural Language Processing ( Poster ) > SlidesLive Video	Sarah Wiegreffe · Ana Marasovic 🔗
-	B-Pref: Benchmarking Preference-Based Reinforcement Learning ( Poster ) > SlidesLive Video	Kimin Lee · Laura Smith · Anca Dragan · Pieter Abbeel 🔗
-	Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks ( Poster ) > link SlidesLive Video Link	Curtis Northcutt · Anish Athalye · Jonas Mueller 🔗
-	CommonsenseQA 2.0: Exposing the Limits of AI through Gamification ( Poster ) > SlidesLive Video	Alon Talmor · Ori Yoran · Ronan Le Bras · Chandra Bhagavatula · Yoav Goldberg · Yejin Choi · Jonathan Berant 🔗
-	Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning ( Poster ) > link Link	Cameron Voloshin · Hoang Le · Nan Jiang · Yisong Yue 🔗
-	ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation ( Poster ) > link SlidesLive Video Link	24 presenters Chuang Gan · Jeremy Schwartz · Seth Alter · Damian Mrowca · Martin Schrimpf · James Traer · Julian De Freitas · Jonas Kubilius · Abhishek Bhandwaldar · Nick Haber · Megumi Sano · Kuno Kim · Elias Wang · Michael Lingelbach · Aidan Curtis · Kevin Feigelis · Daniel Bear · Dan Gutfreund · David Cox · Antonio Torralba · James J DiCarlo · Josh Tenenbaum · Josh McDermott · Dan Yamins 🔗
-	Physion: Evaluating Physical Prediction from Vision in Humans and Machines ( Poster ) > link SlidesLive Video Link	15 presenters Daniel Bear · Elias Wang · Damian Mrowca · Felix Binder · Hsiao-Yu Tung · Pramod RT · Cameron Holdaway · Sirui Tao · Kevin Smith · Fan-Yun Sun · Fei-Fei Li · Nancy Kanwisher · Josh Tenenbaum · Dan Yamins · Judith Fan 🔗
-	CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms ( Poster ) > SlidesLive Video	Martin Pawelczyk · Sascha Bielawski · Johan Van den Heuvel · Tobias Richter · Gjergji Kasneci 🔗
-	It's COMPASlicated: The Messy Relationship between RAI Datasets and Algorithmic Fairness Benchmarks ( Poster ) > link SlidesLive Video Link	Michelle Bao · Angela Zhou · Samantha Zottola · Brian Brubach · Sarah Desmarais · Aaron Horowitz · Kristian Lum · Suresh Venkatasubramanian 🔗
-	Automatic Construction of Evaluation Suites for Natural Language Generation Datasets ( Poster ) > SlidesLive Video	Simon Mille · Kaustubh Dhole · Saad Mahamood · Laura Perez-Beltrachini · Varun Prashant Gangal · Mihir Kale · Emiel van Miltenburg · Sebastian Gehrmann 🔗
-	Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research ( Poster ) > link SlidesLive Video Link	Bernard Koch · Emily Denton · Alex Hanna · Jacob G Foster 🔗
-	Dynamic Environments with Deformable Objects ( Poster ) > SlidesLive Video	Rika Antonova · peiyang shi · Hang Yin · Zehang Weng · Danica Kragic 🔗
-	An Empirical Investigation of Representation Learning for Imitation ( Poster ) > link SlidesLive Video Link	12 presenters Cynthia Chen · Sam Toyer · Cody Wild · Scott Emmons · Ian Fischer · Kuang-Huei Lee · Neel Alex · Steven Wang · Ping Luo · Stuart Russell · Pieter Abbeel · Rohin Shah 🔗
-	OpenML Benchmarking Suites ( Poster ) > link SlidesLive Video Link	Bernd Bischl · Giuseppe Casalicchio · Matthias Feurer · Pieter Gijsbers · Frank Hutter · Michel Lang · Rafael Gomes Mantovani · Jan van Rijn · Joaquin Vanschoren 🔗
-	Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning ( Poster ) >	Nan Rosemary Ke · Aniket Didolkar · Sarthak Mittal · Anirudh Goyal · Guillaume Lajoie · Stefan Bauer · Danilo Jimenez Rezende · Yoshua Bengio · Chris Pal · Michael Mozer 🔗
-	RB2: Robotic Manipulation Benchmarking with a Twist ( Poster ) > link SlidesLive Video Link	15 presenters Sudeep Dasari · Jianren Wang · Joyce Hong · Shikhar Bahl · Yixin Lin · Austin Wang · Abitha Thankaraj · Karanbir Chahal · Berk Calli · Saurabh Gupta · David Held · Lerrel Pinto · Deepak Pathak · Vikash Kumar · Abhinav Gupta 🔗
-	Really Doing Great at Estimating CATE? A Critical Look at ML Benchmarking Practices in Treatment Effect Estimation ( Poster ) > SlidesLive Video	Alicia Curth · David Svensson · Jim Weatherall · Mihaela van der Schaar 🔗
-	Chest ImaGenome Dataset for Clinical Reasoning ( Poster ) > link SlidesLive Video Link	12 presenters Joy T Wu · Nkechinyere Agu · Ismini Lourentzou · Arjun Sharma · Joseph Alexander Paguio · Jasper Seth Yao · Edward C Dee · William Mitchell · Satyananda Kashyap · Andrea Giovannini · Leo Anthony Celi · Mehdi Moradi 🔗
-	Mitigating dataset harms requires stewardship: Lessons from 1000 papers ( Poster ) > link SlidesLive Video Link	Kenneth Peng · Arunesh Mathur · Arvind Narayanan 🔗
-	Artsheets for Art Datasets ( Poster ) > SlidesLive Video	Ramya Srinivasan · Emily Denton · Jordan Famularo · Negar Rostamzadeh · Fernando Diaz · Beth Coleman 🔗
-	An Empirical Study of Graph Contrastive Learning ( Poster ) > link SlidesLive Video Link	Yanqiao Zhu · Yichen Xu · Qiang Liu · Shu Wu 🔗
-	Monash Time Series Forecasting Archive ( Poster ) > SlidesLive Video	Rakshitha W Godahewa · Christoph Bergmeir · Geoffrey Webb · Rob Hyndman · Pablo Montero-Manso 🔗
-	Synthetic Benchmarks for Scientific Research in Explainable Machine Learning ( Poster ) > SlidesLive Video	Yang Liu · Sujay Khandagale · Colin White · Willie Neiswanger 🔗
-	A Toolbox for Construction and Analysis of Speech Datasets ( Poster ) > SlidesLive Video	Evelina Bakhturina · Vitaly Lavrukhin · Boris Ginsburg 🔗
-	Evaluating Bayes Error Estimators on Real-World Datasets with FeeBee ( Poster ) > SlidesLive Video	Cedric Renggli · Luka Rimanic · Nora Hollenstein · Ce Zhang 🔗
-	Alchemy: A benchmark and analysis toolkit for meta-reinforcement learning agents ( Poster ) > SlidesLive Video	17 presenters Jane Wang · Michael King · Nicolas Porcel · Zeb Kurth-Nelson · Tina Zhu · Charles Deck · Peter Choy · Mary Cassin · Malcolm Reynolds · Francis Song · Gavin Buttimore · David Reichert · Neil Rabinowitz · Loic Matthey · Demis Hassabis · Alexander Lerchner · Matt Botvinick 🔗
-	FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark ( Poster ) > SlidesLive Video	16 presenters Mingjie Li · Wenjia Cai · Rui Liu · Yuetian Weng · Xiaoyun Zhao · Cong Wang · Xin Chen · Zhong Liu · Caineng Pan · Mengke Li · yingfeng zheng · Yizhi Liu · Flora Salim · Karin Verspoor · Xiaodan Liang · Xiaojun Chang 🔗
-	An Information Retrieval Approach to Building Datasets for Hate Speech Detection ( Poster ) > link SlidesLive Video Link	Md Mustafizur Rahman · Dinesh Balakrishnan · Dhiraj Murthy · Mucahid Kutlu · Matt Lease 🔗
-	Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation ( Poster ) > SlidesLive Video	Yuta Saito · Shunsuke Aihara · Megumi Matsutani · Yusuke Narita 🔗
-	ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale Demonstrations ( Poster ) > SlidesLive Video	Tongzhou Mu · Zhan Ling · Fanbo Xiang · Derek Yang · Xuanlin Li · Stone Tao · Zhiao Huang · Zhiwei Jia · Hao Su 🔗
-	AI and the Everything in the Whole Wide World Benchmark ( Poster ) > link SlidesLive Video Link	Deborah Raji · Emily Denton · Emily M. Bender · Alex Hanna · Amandalynne Paullada 🔗
-	Are We Learning Yet? A Meta Review of Evaluation Failures Across Machine Learning ( Poster ) > SlidesLive Video	Thomas Liao · Rohan Taori · Deborah Raji · Ludwig Schmidt 🔗
-	Isaac Gym: High Performance GPU Based Physics Simulation For Robot Learning ( Poster ) > SlidesLive Video	11 presenters Viktor Makoviychuk · Lukasz Wawrzyniak · Yunrong Guo · Michelle Lu · Kier Storey · Miles Macklin · David Hoeller · Nikita Rudin · Arthur Allshire · Ankur Handa · Gavriel State 🔗
-	Hardware Design and Accurate Simulation of Structured-Light Scanning for Benchmarking of 3D Reconstruction Algorithms ( Poster ) > link SlidesLive Video Link	Sebastian Koch · Yurii Piadyk · Markus Worchel · Marc Alexa · Claudio Silva · Denis Zorin · Daniele Panozzo 🔗
-	The Medkit-Learn(ing) Environment: Medical Decision Modelling through Simulation ( Poster ) > SlidesLive Video	Alex Chan · Ioana Bica · Alihan Hüyük · Daniel Jarrett · Mihaela van der Schaar 🔗
-	URLB: Unsupervised Reinforcement Learning Benchmark ( Poster ) > SlidesLive Video	Misha Laskin · Denis Yarats · Hao Liu · Kimin Lee · Albert Zhan · Kevin Lu · Catherine Cang · Lerrel Pinto · Pieter Abbeel 🔗
-	What Would Jiminy Cricket Do? Towards Agents That Behave Morally ( Poster ) > SlidesLive Video	Dan Hendrycks · Mantas Mazeika · Andy Zou · Sahil Patel · Christine Zhu · Jesus Navarro · Dawn Song · Bo Li · Jacob Steinhardt 🔗