Tue 8:30 a.m. - 8:45 a.m.
|
Opening Remarks
(
Opening Remarks
)
>
SlidesLive Video
|
Andrew Ng
🔗
|
Tue 8:45 a.m. - 9:00 a.m.
|
Workshop Overview
(
Talk
)
>
SlidesLive Video
|
Lora Aroyo
🔗
|
Tue 9:00 a.m. - 9:15 a.m.
|
Human Computer Interaction and Crowdsourcing for Data Centric AI
(
Keynote
)
>
SlidesLive Video
|
Michael Bernstein
🔗
|
Tue 9:15 a.m. - 9:25 a.m.
|
Past and Future of data centric AI
(
Invited Talk
)
>
SlidesLive Video
|
Olga Russakovsky
🔗
|
Tue 9:25 a.m. - 9:27 a.m.
|
Data Centric AI Competition
(
Intro
)
>
SlidesLive Video
|
Lynn He
🔗
|
Tue 9:27 a.m. - 9:29 a.m.
|
Data Centric AI Competition : Divakar Roy
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 9:29 a.m. - 9:31 a.m.
|
Data Centric AI Competition: Shashank Deshpande
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 9:31 a.m. - 9:33 a.m.
|
Data Centric AI Competition: Johnson Kuan
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 9:33 a.m. - 9:35 a.m.
|
Data Centric AI Competition: Rens Dimmendaal
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 9:35 a.m. - 9:37 a.m.
|
Data Centric AI Competition: Nidhish Shah
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 9:37 a.m. - 9:39 a.m.
|
A Data-Centric Approach for Training Deep Neural Networks with Less Data
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 9:40 a.m. - 9:50 a.m.
|
Q&A Lightning Talk - Benchmarking
(
Q&A Session
)
>
SlidesLive Video
|
Lynn He · Greg Diamos
🔗
|
Tue 9:49 a.m. - 9:52 a.m.
|
Lightning Talks - Benchmarks and Challenges
(
Intro
)
>
|
Vijay Janapa Reddi · Cody Coleman
🔗
|
Tue 9:50 a.m. - 9:52 a.m.
|
Few-Shot Image Classification Challenge On-Board OPS-SAT
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 9:52 a.m. - 9:54 a.m.
|
No News is Good News: A Critique of the One Billion Word Benchmark
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 9:54 a.m. - 9:56 a.m.
|
A Data-Centric Image Classification Benchmark
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 9:56 a.m. - 9:58 a.m.
|
On Data-centric Myths
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 9:58 a.m. - 10:00 a.m.
|
Human-inspired Data Centric Computer Vision
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 10:00 a.m. - 10:20 a.m.
|
Q&A Lightning Talk - Benchmarks and Challenges
(
Q&A session
)
>
SlidesLive Video
|
Cody Coleman · Vijay Janapa Reddi
🔗
|
Tue 10:20 a.m. - 10:25 a.m.
|
Break
|
🔗
|
Tue 10:25 a.m. - 10:40 a.m.
|
DataPerf - Peter Mattson and Praveen Paritosh
(
Talk
)
>
SlidesLive Video
|
Peter Mattson
🔗
|
Tue 10:40 a.m. - 10:42 a.m.
|
Lightning Talks - Challenge Problems and Theory
(
Intro
)
>
|
Vijay Janapa Reddi · Carole-Jean Wu
🔗
|
Tue 10:42 a.m. - 10:44 a.m.
|
YMIR: A Rapid Data Development Platform for Long-tailed Vision Applications
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 10:44 a.m. - 10:46 a.m.
|
CircleNLU: A Tool for building Data-Driven Natural Language Understanding System
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 10:46 a.m. - 10:48 a.m.
|
AirSAS: Controlled Dataset Generation for Physics-Informed Machine Learning
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 10:48 a.m. - 10:50 a.m.
|
Lhotse: a speech data representation library for the modern deep learning ecosystem
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 10:50 a.m. - 10:52 a.m.
|
Data-Driven Deep Reinforcement Learning in Quantitative Finance
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 10:54 a.m. - 10:56 a.m.
|
Ground-Truth, Whose Truth? - Examining the Challenges with Annotating Toxic Text Datasets
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 10:56 a.m. - 10:58 a.m.
|
Data-Centric AI Requires Rethinking Data Notion
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 10:58 a.m. - 11:00 a.m.
|
Small Data in NLU: Proposals towards a Data-Centric Approach
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 11:00 a.m. - 11:02 a.m.
|
Towards better data discovery and collection with flow-based programming
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 11:00 a.m. - 11:15 a.m.
|
Q&A Lightning Talks - Challenge Problems and Theory
(
Q&A Sessions
)
>
SlidesLive Video
|
Cody Coleman · Vijay Janapa Reddi
🔗
|
Tue 11:15 a.m. - 11:20 a.m.
|
Break
|
🔗
|
Tue 11:20 a.m. - 11:30 a.m.
|
Facebook - Data Centric Infrastructure
(
Invited Talk
)
>
SlidesLive Video
|
Douwe Kiela
🔗
|
Tue 11:30 a.m. - 11:32 a.m.
|
Lightning Talks - Responsibility and Ethics
(
Intro
)
>
|
Sharon Zhou · Carole-Jean Wu
🔗
|
Tue 11:32 a.m. - 11:34 a.m.
|
Sampling To Improve Predictions For Underrepresented Observations In Imbalanced Data
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 11:34 a.m. - 11:36 a.m.
|
Feminist Curation of Text for Data-centric AI
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 11:36 a.m. - 11:38 a.m.
|
Addressing Content Selection Bias in Creating Datasets for Hate Speech Detection
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 11:38 a.m. - 11:40 a.m.
|
Data Cards: Purposeful and Transparent Documentation for Responsible AI
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 11:42 a.m. - 11:44 a.m.
|
A Data-Centric Behavioral Machine Learning Platform to Reduce Health Inequalities
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 11:44 a.m. - 11:46 a.m.
|
Simultaneous Improvement of ML Model Fairness and Performance by Identifying Bias in Data
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 11:48 a.m. - 11:50 a.m.
|
Building Legal Datasets
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 11:49 a.m. - 11:51 a.m.
|
DAG Card is the new Model Card
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 11:50 a.m. - 12:05 p.m.
|
Q&A Lightning Talks - Responsibility and Ethics
(
Q&A Session
)
>
SlidesLive Video
|
Vijay Janapa Reddi · Cody Coleman
🔗
|
Tue 12:05 p.m. - 12:10 p.m.
|
Break
|
🔗
|
Tue 12:10 p.m. - 12:50 p.m.
|
Q&A with Morning Invited + Keynote Speakers + Closing Remarks
(
Q&A Session
)
>
SlidesLive Video
|
Andrew Ng · Sharon Zhou
🔗
|
Tue 12:50 p.m. - 1:20 p.m.
|
Break - watch the on-demand videos and ask questions in Rocket.Chat
|
🔗
|
Tue 1:20 p.m. - 1:35 p.m.
|
Alex Ratner and Chris Re - The Future of Data Centric AI
(
Keynote
)
>
SlidesLive Video
|
Christopher RĂ©
🔗
|
Tue 1:35 p.m. - 1:45 p.m.
|
Technical Debt in ML: A Data-Centric View
(
Invited talk
)
>
SlidesLive Video
|
D. Sculley
🔗
|
Tue 1:45 p.m. - 1:47 p.m.
|
Lightning Talks - Data Synthesis and Datasets
(
Recorded Talks
)
>
|
Carole-Jean Wu
🔗
|
Tue 1:47 p.m. - 1:49 p.m.
|
Towards Systematic Evaluation in Machine Learning through Automated Stress Test Creation
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 1:49 p.m. - 1:51 p.m.
|
Bridging the gap to real-world for network intrusion detection systems with data-centric approach
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 1:51 p.m. - 1:53 p.m.
|
IMDB-WIKI-SbS: An Evaluation Dataset for Crowdsourced Pairwise Comparisons
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 1:53 p.m. - 1:55 p.m.
|
LSH methods for data deduplication in a Wikipediaartificial dataset
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 1:55 p.m. - 1:57 p.m.
|
Using Synthetic Images To Uncover Population Biases In Facial Landmarks Detection
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 1:57 p.m. - 1:59 p.m.
|
3D ImageNet: A data collection and labeling tool for Depth and RGB Images
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 1:59 p.m. - 2:01 p.m.
|
Augment & Valuate : A Data Enhancement Pipeline for data-centric AI
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 2:01 p.m. - 2:03 p.m.
|
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 2:03 p.m. - 2:05 p.m.
|
Sim2Real Docs: Domain Randomization for Documents in Natural Scenes using Ray-traced Rendering
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 2:07 p.m. - 2:09 p.m.
|
A First Look Towards One-Shot Object Detection with SPOT for Data-Efficient Learning
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 2:09 p.m. - 2:11 p.m.
|
Challenges of Working with Materials R&D Data
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 2:11 p.m. - 2:13 p.m.
|
Open-Sourcing Generative Models for Data-driven Robot Simulations
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 2:13 p.m. - 2:15 p.m.
|
Natural Adversarial Objects
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 2:15 p.m. - 2:40 p.m.
|
Q&A for Lightning Talks - Datasets and Data Synthesis
(
Q&A Session
)
>
SlidesLive Video
|
Greg Diamos · Carole-Jean Wu
🔗
|
Tue 2:40 p.m. - 2:45 p.m.
|
Break
|
🔗
|
Tue 2:45 p.m. - 2:55 p.m.
|
Curtis Northcutt
(
Invited Talk
)
>
SlidesLive Video
|
Curtis Northcutt
🔗
|
Tue 2:55 p.m. - 2:57 p.m.
|
Lightning Talks - Data Quality and Iteration
(
Intro
)
>
|
Greg Diamos
🔗
|
Tue 2:57 p.m. - 2:59 p.m.
|
DiagnosisQA: A semi-automated pipeline for developing clinician validated diagnosis specific QA datasets.
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 2:59 p.m. - 3:01 p.m.
|
Contrasting the Profiles of Easy and Hard Observations in a Dataset
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 3:01 p.m. - 3:03 p.m.
|
Self-supervised Semi-supervised Learning for Data Labeling and Quality Evaluation
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 3:03 p.m. - 3:05 p.m.
|
Engineering AI Tools for Systematic and Scalable Quality Assessment in Magnetic Resonance Imaging
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 3:05 p.m. - 3:07 p.m.
|
PyHard: a novel tool for generating hardness embeddings to support data-centric analysis
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 3:07 p.m. - 3:09 p.m.
|
Increasing Data Diversity with Iterative Sampling to Improve Performance
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 3:09 p.m. - 3:11 p.m.
|
Exploiting Proximity Search and Easy Examples to Select Rare Events
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 3:11 p.m. - 3:13 p.m.
|
Fantastic Data and How to Query Them
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 3:13 p.m. - 3:15 p.m.
|
Exploiting Domain Knowledge for EfficientData-centric Session-based Recommendation model
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 3:15 p.m. - 3:17 p.m.
|
Automatic Data Quality Evaluation for Text Classification
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 3:15 p.m. - 3:35 p.m.
|
Q&A for Lightning Talks - Data Quality and Iteration
(
Q&A Session
)
>
SlidesLive Video
|
Carole-Jean Wu · Greg Diamos
🔗
|
Tue 3:35 p.m. - 3:40 p.m.
|
Break
|
🔗
|
Tue 3:40 p.m. - 3:50 p.m.
|
Anima Anandkumar
(
Invited Talk
)
>
SlidesLive Video
|
Anima Anandkumar
🔗
|
Tue 3:50 p.m. - 3:52 p.m.
|
Lightning Talks - Data Labeling
(
Intro
)
>
|
Carole-Jean Wu
🔗
|
Tue 3:51 p.m. - 3:53 p.m.
|
Decreasing Annotation Burden of Pairwise Comparisons with Human-in-the-Loop Sorting: Application in Medical Image Artifact Rating
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 3:53 p.m. - 3:55 p.m.
|
Effect of Radiology Report Labeler Quality on Deep Learning Models for Chest X-Ray Interpretation
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 3:53 p.m. - 3:55 p.m.
|
Towards a Shared Rubric for Dataset Annotation
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 3:55 p.m. - 3:57 p.m.
|
Influence of human-expert labels on a neonatal seizure detector based on a convolutional neural network
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 3:57 p.m. - 3:59 p.m.
|
Utilizing Driving Context to Increase the Annotation Efficiency of Imbalanced Gaze Image Data
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 3:59 p.m. - 4:01 p.m.
|
Highly Efficient Representation and Active Learning Framework and Its Application to Imbalanced Medical Image Classification
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 4:01 p.m. - 4:03 p.m.
|
Whose Ground Truth? Accounting for Individual and Collective Identities Underlying Dataset Annotation
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 4:03 p.m. - 4:05 p.m.
|
Finding Label Errors in Autonomous Vehicle Data With Learned Observation Assertions
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 4:05 p.m. - 4:07 p.m.
|
Ontolabeling: Re-Thinking Data Labeling For Computer Vision
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 4:07 p.m. - 4:09 p.m.
|
Single-Click 3D Object Annotation on LiDAR Point Clouds
(
Lightning Talk
)
>
SlidesLive Video
|
🔗
|
Tue 4:10 p.m. - 4:30 p.m.
|
Q&A for Lightning Talks - Data Labeling
(
Q&A Session
)
>
SlidesLive Video
|
Carole-Jean Wu · Greg Diamos
🔗
|
Tue 4:30 p.m. - 5:10 p.m.
|
Q&A with Afternoon Invited + Keynote Speakers + Closing Remarks
(
Q&A Session
)
>
SlidesLive Video
|
Andrew Ng · Sharon Zhou
🔗
|
Tue 5:00 p.m. - 6:00 p.m.
|
Below are the videos of accepted Lighting Talks that are not presented in the livestream
(
Lightning Talks (videos)
)
>
|
🔗
|
-
|
A Hybrid Bayesian Model to Analyse Healthcare Data
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
How should human translation coexist with NMT? Efficient tool for building high quality parallel corpus
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
A New Tool for Efficiently Generating Quality Estimation Datasets
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Automatic Knowledge Augmentation for Generative Commonsense Reasoning
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Tabular Engineering with Automunge
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
A Probabilistic Framework for Knowledge GraphData Augmentation
(
video
)
>
|
🔗
|
-
|
FedHist: A Federated-First Dataset for Learning inHealthcare
(
video
)
>
|
🔗
|
-
|
Dialectal Voice : An Open-Source Voice Dataset and Automatic Speech Recognition model for Moroccan Arabic dialectal
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
SCIMAT: Science and Mathematics Dataset
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Annotation Quality Framework - Accuracy,Credibility, and Consistency
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Diagnosing severity levels of Autism Spectrum Disorder with Machine Learning
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Comparing Data Augmentation and Annotation Standardization to Improve End-to-end Spoken Language Understanding Models
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Challenges and Solutions to build a Data Pipeline to Identify Anomalies in Enterprise System Performance
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Unleashing the Power of Industrial Big Data through Scalable Manual Labeling
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
nferX: a case study on data-centric NLP in biomedicine
(
video
)
>
link
|
🔗
|
-
|
All in one Data Cleansing Tool
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
A concept for fitness-for-use evaluation in Machine Learning pipelines
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Vietnamese Speech-based Question Answering over Car Manuals
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Towards a Taxonomy of Graph Learning Datasets
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Bridging the gap between AI and the life sciences: towards a standardized multi-omics data type
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Data preparation for training CNNs: Application to vibration-based condition monitoring
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Evaluating Machine Learning Models for Internet Network Security with Data Slices
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
AutoDQ: Automatic Data Quality for Financial Data
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Combining Data-driven Supervision with Human-in-the-loop Feedback for Entity Resolution
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Two Approaches to Building Dialogue Systems for People on the Spectrum
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
What can Data-Centric AI Learn from Data and ML Engineering?
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Annotation Inconsistency and Entity Bias inMultiWOZ
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Seg-Diff: Checkpoints Are All You Need
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
AutoDC: Automated data-centric processing
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Data Augmentation for Intent Classification
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
InfiniteForm: A synthetic, minimal bias dataset for fitness applications
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Who Decides if AI is Fair? The Labels Problem in Algorithmic Auditing
(
video
)
>
|
🔗
|
-
|
Topological Deep Learning
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Fix your Model by Fixing your Datasets
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Data Expressiveness and Its Use in Data-centric AI
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Debiasing Pre-Trained Sentence Encoders With WordDropouts on Fine-Tuning Data
(
video
)
>
|
🔗
|
-
|
Towards a Framework for Data Excellence in Data-Centric AI: Lessons from the Semantic Web
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Homogenization of Existing Inertial-Based Datasets to Support Human Activity Recognition
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Can machines learn to see without visual databases?
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Data Agnostic Image Annotation
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
On Biased Systems and Data
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
Data vast and low in variance: Augment machine learning pipelines with dataset profiles to improve data quality without sacrificing scale
(
video
)
>
SlidesLive Video
|
🔗
|
-
|
CogALex 2.0: Impact of Data Quality on Lexical-Semantic Relation Prediction
(
video
)
>
SlidesLive Video
|
🔗
|