Has it Trained Yet? A Workshop for Algorithmic Efficiency in Practical Neural Network Training

Workshop

Has it Trained Yet? A Workshop for Algorithmic Efficiency in Practical Neural Network Training

Frank Schneider · Zachary Nado · Philipp Hennig · George Dahl · Naman Agarwal

Theater B

Fri 2 Dec, 6:30 a.m. PST

[ Abstract ] Workshop Website

[ Contact: hityworkshop@gmail.com ]

Workshop Description

Training contemporary neural networks is a lengthy and often costly process, both in human designer time and compute resources. Although the field has invented numerous approaches, neural network training still usually involves an inconvenient amount of “babysitting” to get the model to train properly. This not only requires enormous compute resources but also makes deep learning less accessible to outsiders and newcomers. This workshop will be centered around the question “How can we train neural networks faster” by focusing on the effects algorithms (not hardware or software developments) have on the training time of neural networks. These algorithmic improvements can come in the form of novel methods, e.g. new optimizers or more efficient data selection strategies, or through empirical experience, e.g. best practices for quickly identifying well-working hyperparameter settings or informative metrics to monitor during training.

We all think we know how to train deep neural networks, but we all seem to have different ideas. Ask any deep learning practitioner about the best practices of neural network training, and you will often hear a collection of arcane recipes. Frustratingly, these hacks vary wildly between companies and teams. This workshop offers a platform to talk about these ideas, agree on what is actually known, and what is just noise. In this sense, this will not be an “optimization workshop” in the mathematical sense (of which there have been several in the past, of course).

To this end, the workshop’s goal is to connect two communities: Researchers who develop new algorithms for faster neural network training, such as new optimization methods or deep learning architectures. Practitioners who, through their work on real-world problems, are increasingly relying on “tricks of the trade”. The workshop aims to close the gap between research and applications, identifying the most relevant current issues that hinder faster neural network training in practice.

Topics

Among the topics addressed by the workshop are:

- What “best practices” for faster neural network training are used in practice and can we learn from them to build better algorithms?
- What are painful lessons learned while training deep learning models?
- What are the most needed algorithmic improvements for neural network training?
- How can we ensure that research on training methods for deep learning has practical relevance?

Important Dates

- Submission Deadline: September 30, 2022, 07:00am UTC (updated!)
- Accept/Reject Notification Date: October 20, 2022, 07:00am UTC (updated!)
- Workshop Date: December 2, 2022

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Fri 6:30 a.m. - 6:40 a.m.	Welcome and Opening Remarks ( Opening Remarks ) > SlidesLive Video	🔗
Fri 6:40 a.m. - 7:10 a.m.	Invited Talk by Aakanksha Chowdhery ( Talk ) > SlidesLive Video	Aakanksha Chowdhery 🔗
Fri 7:10 a.m. - 7:20 a.m.	Q & A with Aakanksha Chowdhery ( Q & A ) >	🔗
Fri 7:20 a.m. - 7:50 a.m.	Benchmarking Trainng Algorithms by Zachary Nado ( Talk ) > SlidesLive Video	Zachary Nado 🔗
Fri 7:50 a.m. - 8:00 a.m.	Q & A with Zachary Nado ( Q & A ) >	🔗
Fri 8:00 a.m. - 8:15 a.m.	Coffee Break	🔗
Fri 8:15 a.m. - 8:45 a.m.	Invited Talk by ‪Jimmy Ba ( Talk ) > SlidesLive Video	Jimmy Ba 🔗
Fri 8:45 a.m. - 8:55 a.m.	Q & A with Jimmy Ba ( Q & A ) >	🔗
Fri 8:55 a.m. - 9:25 a.m.	Invited Talk by ‪Susan Zhang ( Talk ) > SlidesLive Video	Susan Zhang 🔗
Fri 9:25 a.m. - 9:35 a.m.	Q & A with Susan Zhang ( Q & A ) >	🔗
Fri 9:35 a.m. - 10:00 a.m.	Interactive Audience Session ( Q & A ) > SlidesLive Video	🔗
Fri 10:00 a.m. - 11:30 a.m.	Lunch Break	🔗
Fri 11:30 a.m. - 12:00 p.m.	Invited Talk by ‪Boris Dayma ( Talk ) > SlidesLive Video	Boris Dayma 🔗
Fri 12:00 p.m. - 12:10 p.m.	Q & A with Boris Dayma ( Q & A ) >	🔗
Fri 12:10 p.m. - 1:00 p.m.	Poster Session and Open Discussion ( Poster Session ) >	🔗
Fri 1:00 p.m. - 1:15 p.m.	Coffee Break	🔗
Fri 1:15 p.m. - 1:45 p.m.	Invited Talk by ‪Stanislav Fort ( Talk ) >	Stanislav Fort 🔗
Fri 1:45 p.m. - 1:55 p.m.	Q & A with Stanislav Fort ( Q & A ) >	🔗
Fri 1:55 p.m. - 3:00 p.m.	Panel Discussion ( Panel Discussion ) > SlidesLive Video	🔗
-	Can Calibration Improve Sample Prioritization? ( Poster ) > link Link	Ganesh Tata · Gautham Krishna Gudur · Gopinath Chennupati · Mohammad Emtiyaz Khan 🔗
-	Unmasking the Lottery Ticket Hypothesis: Efficient Adaptive Pruning for Finding Winning Tickets ( Poster ) > link Link	Mansheej Paul · Feng Chen · Brett Larsen · Jonathan Frankle · Surya Ganguli · Gintare Karolina Dziugaite 🔗
-	Layover Intermediate Layer for Multi-Label Classification in Efficient Transfer Learning ( Poster ) > link Link	Seongha Eom · Taehyeon Kim · Se-Young Yun 🔗
-	A Scalable Technique for Weak-Supervised Learning with Domain Constraints ( Poster ) > link Link	Sudhir Agarwal · Anu Sreepathy · Lalla M 🔗
-	IMPON: Efficient IMPortance sampling with ONline regression for rapid neural network training ( Poster ) > link Link	Vignesh Ganapathiraman · Francisco Calderon · Anila Joshi 🔗
-	Relating Regularization and Generalization through the Intrinsic Dimension of Activations ( Poster ) > link Link	Bradley Brown · Jordan Juravsky · Anthony Caterini · Gabriel Loaiza-Ganem 🔗
-	The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the \emph{Grokking Phenomenon} ( Poster ) > link Link	Vimal Thilak · Etai Littwin · Shuangfei Zhai · Omid Saremi · Roni Paiss · Joshua Susskind 🔗
-	Breadth-first pipeline parallelism ( Poster ) > link Link	Joel Lamy-Poirier 🔗
-	Fishy: Layerwise Fisher Approximation for Higher-order Neural Network Optimization ( Poster ) > link Link	Abel Peirson · Ehsan Amid · Yatong Chen · Vladimir Feinberg · Manfred Warmuth · Rohan Anil 🔗
-	Fast Implicit Constrained Optimization of Non-decomposable Objectives for Deep Networks ( Poster ) > link Link	Yatong Chen · Abhishek Kumar · Yang Liu · Ehsan Amid 🔗
-	Efficient regression with deep neural networks: how many datapoints do we need? ( Poster ) > link Link	Daniel Lengyel · Anastasia Borovykh 🔗
-	Perturbing BatchNorm and Only BatchNorm Benefits Sharpness-Aware Minimization ( Poster ) > link Link	Maximilian Mueller · Matthias Hein 🔗
-	Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging ( Poster ) > link Link	Jean Kaddour 🔗
-	Batch size selection by stochastic optimal contro ( Poster ) > link Link	Jim Zhao · Aurelien Lucchi · Frank Proske · Antonio Orvieto · Hans Kersting 🔗
-	MC-DARTS : Model Size Constrained Differentiable Architecture Search ( Poster ) > link Link	Kazuki Hemmi · Yuki Tanigaki · Masaki Onishi 🔗
-	Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models ( Poster ) > link Link	Xingyu Xie · Pan Zhou · Huan Li · Zhouchen Lin · Shuicheng Yan 🔗
-	Win: Weight-Decay-Integrated Nesterov Acceleration for Adaptive Gradient Algorithms ( Poster ) > link Link	Pan Zhou · Xingyu Xie · Shuicheng Yan 🔗
-	Active Learning is a Strong Baseline for Data Subset Selection ( Poster ) > link Link	Dongmin Park · Dimitris Papailiopoulos · Kangwook Lee 🔗
-	APE: Aligning Pretrained Encoders to Quickly Learn Aligned Multimodal Representations ( Poster ) > link Link	Elan Rosenfeld · Preetum Nakkiran · Hadi Pouransari · Oncel Tuzel · Fartash Faghri 🔗
-	LOFT: Finding Lottery Tickets through Filter-wise Training ( Poster ) > link Link	Qihan Wang · Chen Dun · Fangshuo Liao · Christopher Jermaine · Anastasios Kyrillidis 🔗
-	Trajectory ensembling for fine tuning - performance gains without modifying training ( Poster ) > link Link	Louise Anderson-Conway · Vighnesh Birodkar · Saurabh Singh · Hossein Mobahi · Alexander Alemi 🔗
-	Training a Vision Transformer from scratch in less than 24 hours with 1 GPU ( Poster ) > link Link	Saghar Irandoust · Thibaut Durand · Yunduz Rakhmangulova · Wenjie Zi · Hossein Hajimirsadeghi 🔗
-	PyHopper - A Plug-and-Play Hyperparameter Optimization Engine ( Poster ) > link Link	Mathias Lechner · Ramin Hasani · Sophie Neubauer · Philipp Neubauer · Daniela Rus 🔗
-	Faster and Cheaper Energy Demand Forecasting at Scale ( Poster ) > link Link	Fabien Bernier · Matthieu Jimenez · Maxime Cordy · YVES LE TRAON 🔗
-	Late-Phase Second-Order Training ( Poster ) > link Link	Lukas Tatzel · Philipp Hennig · Frank Schneider 🔗
-	SADT: Combining Sharpness-Aware Minimization with Self-Distillation for Improved Model Generalization ( Poster ) > link Link	MASUD AN NUR ISLAM FAHIM · Jani Boutellier 🔗
-	Learnable Graph Convolutional Attention Networks ( Poster ) > link Link	Adrián Javaloy · Pablo Sanchez-Martin · Amit Levi · Isabel Valera 🔗
-	When & How to Transfer with Transfer Learning ( Poster ) > link Link	Adrián Tormos · Dario Garcia-Gasulla · Victor Gimenez-Abalos · Sergio Alvarez-Napagao 🔗
-	FastCPH: Efficient Survival Analysis for Neural Networks ( Poster ) > link Link	Xuelin Yang · Louis F Abraham · Sejin Kim · Petr Smirnov · Feng Ruan · Benjamin Haibe-Kains · Robert Tibshirani 🔗
-	Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning Rates ( Poster ) > link Link	Jacob Portes · Davis Blalock · Cory Stephenson · Jonathan Frankle 🔗
-	Feature Encodings for Gradient Boosting with Automunge ( Poster ) > link Link	Nicholas Teague 🔗