Workshop
Has it Trained Yet? A Workshop for Algorithmic Efficiency in Practical Neural Network Training
Frank Schneider · Zachary Nado · Philipp Hennig · George Dahl · Naman Agarwal
Theater B
Fri 2 Dec, 6:30 a.m. PST
Workshop Description
Training contemporary neural networks is a lengthy and often costly process, both in human designer time and compute resources. Although the field has invented numerous approaches, neural network training still usually involves an inconvenient amount of “babysitting” to get the model to train properly. This not only requires enormous compute resources but also makes deep learning less accessible to outsiders and newcomers. This workshop will be centered around the question “How can we train neural networks faster” by focusing on the effects algorithms (not hardware or software developments) have on the training time of neural networks. These algorithmic improvements can come in the form of novel methods, e.g. new optimizers or more efficient data selection strategies, or through empirical experience, e.g. best practices for quickly identifying well-working hyperparameter settings or informative metrics to monitor during training.
We all think we know how to train deep neural networks, but we all seem to have different ideas. Ask any deep learning practitioner about the best practices of neural network training, and you will often hear a collection of arcane recipes. Frustratingly, these hacks vary wildly between companies and teams. This workshop offers a platform to talk about these ideas, agree on what is actually known, and what is just noise. In this sense, this will not be an “optimization workshop” in the mathematical sense (of which there have been several in the past, of course).
To this end, the workshop’s goal is to connect two communities: Researchers who develop new algorithms for faster neural network training, such as new optimization methods or deep learning architectures. Practitioners who, through their work on real-world problems, are increasingly relying on “tricks of the trade”. The workshop aims to close the gap between research and applications, identifying the most relevant current issues that hinder faster neural network training in practice.
Topics
Among the topics addressed by the workshop are:
- What “best practices” for faster neural network training are used in practice and can we learn from them to build better algorithms?
- What are painful lessons learned while training deep learning models?
- What are the most needed algorithmic improvements for neural network training?
- How can we ensure that research on training methods for deep learning has practical relevance?
Important Dates
- Submission Deadline: September 30, 2022, 07:00am UTC (updated!)
- Accept/Reject Notification Date: October 20, 2022, 07:00am UTC (updated!)
- Workshop Date: December 2, 2022
Schedule
Fri 6:30 a.m. - 6:40 a.m.
|
Welcome and Opening Remarks
(
Opening Remarks
)
>
SlidesLive Video |
🔗 |
Fri 6:40 a.m. - 7:10 a.m.
|
Invited Talk by Aakanksha Chowdhery
(
Talk
)
>
SlidesLive Video |
Aakanksha Chowdhery 🔗 |
Fri 7:10 a.m. - 7:20 a.m.
|
Q & A with Aakanksha Chowdhery
(
Q & A
)
>
|
🔗 |
Fri 7:20 a.m. - 7:50 a.m.
|
Benchmarking Trainng Algorithms by Zachary Nado
(
Talk
)
>
SlidesLive Video |
Zachary Nado 🔗 |
Fri 7:50 a.m. - 8:00 a.m.
|
Q & A with Zachary Nado
(
Q & A
)
>
|
🔗 |
Fri 8:00 a.m. - 8:15 a.m.
|
Coffee Break
|
🔗 |
Fri 8:15 a.m. - 8:45 a.m.
|
Invited Talk by Jimmy Ba
(
Talk
)
>
SlidesLive Video |
Jimmy Ba 🔗 |
Fri 8:45 a.m. - 8:55 a.m.
|
Q & A with Jimmy Ba
(
Q & A
)
>
|
🔗 |
Fri 8:55 a.m. - 9:25 a.m.
|
Invited Talk by Susan Zhang
(
Talk
)
>
SlidesLive Video |
Susan Zhang 🔗 |
Fri 9:25 a.m. - 9:35 a.m.
|
Q & A with Susan Zhang
(
Q & A
)
>
|
🔗 |
Fri 9:35 a.m. - 10:00 a.m.
|
Interactive Audience Session
(
Q & A
)
>
SlidesLive Video |
🔗 |
Fri 10:00 a.m. - 11:30 a.m.
|
Lunch Break
|
🔗 |
Fri 11:30 a.m. - 12:00 p.m.
|
Invited Talk by Boris Dayma
(
Talk
)
>
SlidesLive Video |
Boris Dayma 🔗 |
Fri 12:00 p.m. - 12:10 p.m.
|
Q & A with Boris Dayma
(
Q & A
)
>
|
🔗 |
Fri 12:10 p.m. - 1:00 p.m.
|
Poster Session and Open Discussion
(
Poster Session
)
>
|
🔗 |
Fri 1:00 p.m. - 1:15 p.m.
|
Coffee Break
|
🔗 |
Fri 1:15 p.m. - 1:45 p.m.
|
Invited Talk by Stanislav Fort
(
Talk
)
>
|
Stanislav Fort 🔗 |
Fri 1:45 p.m. - 1:55 p.m.
|
Q & A with Stanislav Fort
(
Q & A
)
>
|
🔗 |
Fri 1:55 p.m. - 3:00 p.m.
|
Panel Discussion
(
Panel Discussion
)
>
SlidesLive Video |
🔗 |
-
|
Can Calibration Improve Sample Prioritization? ( Poster ) > link | Ganesh Tata · Gautham Krishna Gudur · Gopinath Chennupati · Mohammad Emtiyaz Khan 🔗 |
-
|
Unmasking the Lottery Ticket Hypothesis: Efficient Adaptive Pruning for Finding Winning Tickets ( Poster ) > link | Mansheej Paul · Feng Chen · Brett Larsen · Jonathan Frankle · Surya Ganguli · Gintare Karolina Dziugaite 🔗 |
-
|
Layover Intermediate Layer for Multi-Label Classification in Efficient Transfer Learning ( Poster ) > link | Seongha Eom · Taehyeon Kim · Se-Young Yun 🔗 |
-
|
A Scalable Technique for Weak-Supervised Learning with Domain Constraints ( Poster ) > link | Sudhir Agarwal · Anu Sreepathy · Lalla M 🔗 |
-
|
IMPON: Efficient IMPortance sampling with ONline regression for rapid neural network training ( Poster ) > link | Vignesh Ganapathiraman · Francisco Calderon · Anila Joshi 🔗 |
-
|
Relating Regularization and Generalization through the Intrinsic Dimension of Activations ( Poster ) > link | Bradley Brown · Jordan Juravsky · Anthony Caterini · Gabriel Loaiza-Ganem 🔗 |
-
|
The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the \emph{Grokking Phenomenon} ( Poster ) > link | Vimal Thilak · Etai Littwin · Shuangfei Zhai · Omid Saremi · Roni Paiss · Joshua Susskind 🔗 |
-
|
Breadth-first pipeline parallelism ( Poster ) > link | Joel Lamy-Poirier 🔗 |
-
|
Fishy: Layerwise Fisher Approximation for Higher-order Neural Network Optimization ( Poster ) > link | Abel Peirson · Ehsan Amid · Yatong Chen · Vladimir Feinberg · Manfred Warmuth · Rohan Anil 🔗 |
-
|
Fast Implicit Constrained Optimization of Non-decomposable Objectives for Deep Networks ( Poster ) > link | Yatong Chen · Abhishek Kumar · Yang Liu · Ehsan Amid 🔗 |
-
|
Efficient regression with deep neural networks: how many datapoints do we need? ( Poster ) > link | Daniel Lengyel · Anastasia Borovykh 🔗 |
-
|
Perturbing BatchNorm and Only BatchNorm Benefits Sharpness-Aware Minimization ( Poster ) > link | Maximilian Mueller · Matthias Hein 🔗 |
-
|
Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging ( Poster ) > link | Jean Kaddour 🔗 |
-
|
Batch size selection by stochastic optimal contro ( Poster ) > link | Jim Zhao · Aurelien Lucchi · Frank Proske · Antonio Orvieto · Hans Kersting 🔗 |
-
|
MC-DARTS : Model Size Constrained Differentiable Architecture Search ( Poster ) > link | Kazuki Hemmi · Yuki Tanigaki · Masaki Onishi 🔗 |
-
|
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models ( Poster ) > link | Xingyu Xie · Pan Zhou · Huan Li · Zhouchen Lin · Shuicheng Yan 🔗 |
-
|
Win: Weight-Decay-Integrated Nesterov Acceleration for Adaptive Gradient Algorithms ( Poster ) > link | Pan Zhou · Xingyu Xie · Shuicheng Yan 🔗 |
-
|
Active Learning is a Strong Baseline for Data Subset Selection ( Poster ) > link | Dongmin Park · Dimitris Papailiopoulos · Kangwook Lee 🔗 |
-
|
APE: Aligning Pretrained Encoders to Quickly Learn Aligned Multimodal Representations ( Poster ) > link | Elan Rosenfeld · Preetum Nakkiran · Hadi Pouransari · Oncel Tuzel · Fartash Faghri 🔗 |
-
|
LOFT: Finding Lottery Tickets through Filter-wise Training ( Poster ) > link | Qihan Wang · Chen Dun · Fangshuo Liao · Christopher Jermaine · Anastasios Kyrillidis 🔗 |
-
|
Trajectory ensembling for fine tuning - performance gains without modifying training ( Poster ) > link | Louise Anderson-Conway · Vighnesh Birodkar · Saurabh Singh · Hossein Mobahi · Alexander Alemi 🔗 |
-
|
Training a Vision Transformer from scratch in less than 24 hours with 1 GPU ( Poster ) > link | Saghar Irandoust · Thibaut Durand · Yunduz Rakhmangulova · Wenjie Zi · Hossein Hajimirsadeghi 🔗 |
-
|
PyHopper - A Plug-and-Play Hyperparameter Optimization Engine ( Poster ) > link | Mathias Lechner · Ramin Hasani · Sophie Neubauer · Philipp Neubauer · Daniela Rus 🔗 |
-
|
Faster and Cheaper Energy Demand Forecasting at Scale ( Poster ) > link | Fabien Bernier · Matthieu Jimenez · Maxime Cordy · YVES LE TRAON 🔗 |
-
|
Late-Phase Second-Order Training ( Poster ) > link | Lukas Tatzel · Philipp Hennig · Frank Schneider 🔗 |
-
|
SADT: Combining Sharpness-Aware Minimization with Self-Distillation for Improved Model Generalization ( Poster ) > link | MASUD AN NUR ISLAM FAHIM · Jani Boutellier 🔗 |
-
|
Learnable Graph Convolutional Attention Networks ( Poster ) > link | Adrián Javaloy · Pablo Sanchez-Martin · Amit Levi · Isabel Valera 🔗 |
-
|
When & How to Transfer with Transfer Learning ( Poster ) > link | Adrián Tormos · Dario Garcia-Gasulla · Victor Gimenez-Abalos · Sergio Alvarez-Napagao 🔗 |
-
|
FastCPH: Efficient Survival Analysis for Neural Networks ( Poster ) > link | Xuelin Yang · Louis F Abraham · Sejin Kim · Petr Smirnov · Feng Ruan · Benjamin Haibe-Kains · Robert Tibshirani 🔗 |
-
|
Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning Rates ( Poster ) > link | Jacob Portes · Davis Blalock · Cory Stephenson · Jonathan Frankle 🔗 |
-
|
Feature Encodings for Gradient Boosting with Automunge ( Poster ) > link | Nicholas Teague 🔗 |