Workshop
Workshop on Advancing Neural Network Training (WANT): Computational Efficiency, Scalability, and Resource Optimization
Julia Gusak · Jean Kossaifi · Alena Shilova · Rocco Sedona · Cristiana Bentes · Animashree Anandkumar · Olivier Beaumont
Room 243 - 245
Sat 16 Dec, 6:15 a.m. PST
Unlock neural network training's potential for good and science! Enhance computational efficiency, scalability, and resource optimization. Join HPC and AI experts to tackle challenges in theory and applications.
Chat is not available.
Timezone: America/Los_Angeles
Schedule
Sat 6:15 a.m. - 6:50 a.m.
|
Poster Placement
|
🔗 |
Sat 6:50 a.m. - 7:00 a.m.
|
Opening Remarks
(
Talk
)
>
SlidesLive Video |
Julia Gusak 🔗 |
Sat 7:00 a.m. - 7:30 a.m.
|
A Data-Centric View on Workflows that Couple HPC with Large-Scale Models
(
Invited Talk
)
>
SlidesLive Video |
Ana Gainaru 🔗 |
Sat 7:30 a.m. - 8:00 a.m.
|
Rematerialization Algorithms for Memory-efficient Learning
(
Invited Talk
)
>
SlidesLive Video |
Lionel Eyraud-Dubois 🔗 |
Sat 8:00 a.m. - 8:30 a.m.
|
Coffee Break
|
🔗 |
Sat 8:30 a.m. - 9:00 a.m.
|
Navigating the Landscape of Enormous AI Model Training
(
Invited Talk
)
>
SlidesLive Video |
Yang You 🔗 |
Sat 9:00 a.m. - 9:30 a.m.
|
Enabling Efficient Trillion Parameter Scale Training for Deep Learning Models
(
Invited Talk
)
>
SlidesLive Video |
Olatunji Ruwase 🔗 |
Sat 9:30 a.m. - 10:00 a.m.
|
Contributed Talks
(
Talk
)
>
link
SlidesLive Video |
🔗 |
Sat 9:31 a.m. - 9:36 a.m.
|
Training and inference of large language models using 8-bit floating point ( Contributed Talk & Poster ) > link | Sergio Perez · Yan Zhang · James Briggs · Charles Blake · Josh Levy-Kramer · Paul Balanca · Carlo Luschi · Stephen Barlow · Andrew Fitzgibbon 🔗 |
Sat 9:37 a.m. - 9:42 a.m.
|
MatFormer: Nested Transformer for Elastic Inference ( Contributed Talk & Poster ) > link |
11 presentersFnu Devvrit · Sneha Kudugunta · Aditya Kusupati · Tim Dettmers · Kaifeng Chen · Inderjit Dhillon · Yulia Tsvetkov · Hannaneh Hajishirzi · Sham Kakade · Ali Farhadi · Prateek Jain |
Sat 9:43 a.m. - 9:48 a.m.
|
Sparse Backpropagation for MoE Training ( Contributed Talk & Poster ) > link | Liyuan Liu · Jianfeng Gao · Weizhu Chen 🔗 |
Sat 9:49 a.m. - 9:54 a.m.
|
Efficient Parallelization Layouts for Large-Scale Distributed Model Training ( Contributed Talk & Poster ) > link | Johannes Hagemann · Samuel Weinbach · Konstantin Dobler · Maximilian Schall · Gerard de Melo 🔗 |
Sat 9:55 a.m. - 10:00 a.m.
|
CoTFormer: More Tokens With Attention Make Up For Less Depth ( Contributed Talk & Poster ) > link | Amirkeivan Mohtashami · Matteo Pagliardini · Martin Jaggi 🔗 |
Sat 10:00 a.m. - 11:30 a.m.
|
Lunch
|
🔗 |
Sat 11:30 a.m. - 12:00 p.m.
|
Poster Session ( Poster Session ) > link | 🔗 |
Sat 12:00 p.m. - 12:30 p.m.
|
Crafting Computational Efficiency for Large Models: Training Recipes, Scaling Strategies and Sparsity Sorcery with Specialized Hardware
(
Invited Talk
)
>
SlidesLive Video |
Natalia Vassilieva 🔗 |
Sat 12:30 p.m. - 1:00 p.m.
|
Invited Talk by Databricks
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Sat 1:00 p.m. - 1:30 p.m.
|
Coffee Break
|
🔗 |
Sat 1:30 p.m. - 2:00 p.m.
|
Efficient LLM Training and Inference on GPUs
(
Invited Talk
)
>
SlidesLive Video |
Mohammad Shoeybi · Bryan Catanzaro 🔗 |
Sat 2:00 p.m. - 2:50 p.m.
|
Panel Discussion
(
Panel
)
>
SlidesLive Video |
Yang You · Olatunji Ruwase · Natalia Vassilieva · Mohammad Shoeybi · Ana Gainaru · Lionel Eyraud-Dubois · Jean Kossaifi 🔗 |
Sat 2:50 p.m. - 3:00 p.m.
|
Closing Remarks
(
Talk
)
>
SlidesLive Video |
Jean Kossaifi 🔗 |
Sat 3:00 p.m. - 3:30 p.m.
|
Poster Session ( Poster Session ) > link | 🔗 |
-
|
AI4HPC: Library to Train AI Models on HPC Systems using CFD Datasets
(
Poster
)
>
link
SlidesLive Video |
Eray Inanc · Rakesh Sarma · Marcel Aach · Rocco Sedona · Andreas Lintermann 🔗 |
-
|
Efficient and Approximate Per-Example Gradient Norms for Gradient Noise Scale ( Poster ) > link | Gavia Gray · Anshul Samar · Joel Hestness 🔗 |
-
|
Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators
(
Poster
)
>
link
SlidesLive Video |
Yaniv Blumenfeld · Itay Hubara · Daniel Soudry 🔗 |
-
|
ReffAKD: Resource-efficient Autoencoder-based Knowledge Distillation
(
Poster
)
>
link
SlidesLive Video |
Divyang Doshi · Jung-Eun Kim 🔗 |
-
|
Scene-adaptive Knowledge Distillation for Sequential Recommendation via Differentiable Architecture Search ( Poster ) > link | Lei Chen 🔗 |
-
|
Remaining-Useful-Life Prediction and Uncertainty Quantification using LSTM Ensembles for Aircraft Engines
(
Poster
)
>
|
Oishi Deb · Emmanouil Benetos · Philip Torr 🔗 |
-
|
LightSeq: : Sequence Level Parallelism for Distributed Training of Long Context Transformers ( Poster ) > link | Dacheng Li · Rulin Shao · Anze Xie · Eric Xing · Joseph Gonzalez · Ion Stoica · Xuezhe Ma · Hao Zhang 🔗 |
-
|
FlexTrain: A Dynamic Training Framework for Heterogeneous Devices Environments
(
Poster
)
>
link
SlidesLive Video |
Mert Unsal · Ali Maatouk · Antonio De Domenico · Nicola Piovesan · Fadhel Ayed 🔗 |
-
|
DYAD: A Descriptive Yet Abjuring Density efficient approximation to linear neural network layers
(
Poster
)
>
link
SlidesLive Video |
Sarin Chandy · Varun Prashant Gangal · Yi Yang · Gabriel Maggiotti 🔗 |
-
|
Improving Deep Ensembles without Communication ( Poster ) > link | Konstantinos Pitas · Michael Arbel · Julyan Arbel 🔗 |
-
|
ConcatPlexer : Additional Dim1 Batching for Faster ViTs
(
Contributed Talk & Poster
)
>
link
SlidesLive Video |
Donghoon Han · Seunghyeon Seo · Donghyeon Jeon · Jiho Jang · Chaerin Kong · Nojun Kwak 🔗 |
-
|
InstaTune: Instantaneous Neural Architecture Search During Fine-Tuning ( Poster ) > link | Sharath Nittur Sridhar · Souvik Kundu · Sairam Sundaresan · Maciej Szankin · Anthony Sarah 🔗 |
-
|
ReLoRA: High-Rank Training Through Low-Rank Updates ( Poster ) > link | Vladislav Lialin · Sherin Muckatira · Namrata Shivagunde · Anna Rumshisky 🔗 |
-
|
Sparse Iso-FLOP Transformations for Maximizing Training Efficiency ( Poster ) > link | Vithursan Thangarasa · Shreyas Saxena · Abhay Gupta · Sean Lie 🔗 |
-
|
Embarrassingly Simple Dataset Distillation ( Poster ) > link | Yunzhen Feng · Shanmukha Ramakrishna Vedantam · Julia Kempe 🔗 |
-
|
Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs ( Poster ) > link | Suyu Ge · Yunan Zhang · Liyuan Liu · Minjia Zhang · Jiawei Han · Jianfeng Gao 🔗 |
-
|
A Quadratic Synchronization Rule for Distributed Deep Learning ( Poster ) > link | Xinran Gu · Kaifeng Lyu · Sanjeev Arora · Jingzhao Zhang · Longbo Huang 🔗 |
-
|
DAREL: Data Reduction with Losses for Training Acceleration of Real and Hypercomplex Neural Networks
(
Poster
)
>
link
SlidesLive Video |
Alexander Demidovskij · Aleksei Trutnev · Artyom Tugaryov · Igor Salnikov · Stanislav Pavlov 🔗 |
-
|
Accelerating Deep Learning using Ivy
(
Poster
)
>
link
SlidesLive Video |
Guillermo Sanchez-Brizuela · Ved Patwardhan · Matthew Barrett · Paul Anderson · Mustafa Hani · Daniel Lenton 🔗 |
-
|
Something for (almost) nothing: improving deep ensemble calibration using unlabeled data
(
Poster
)
>
link
SlidesLive Video |
Konstantinos Pitas · Julyan Arbel 🔗 |
-
|
LeanFlex-GKP: Advancing Hassle-Free Structured Pruning with Simple Flexible Group Count ( Poster ) > link | Jiamu Zhang · Shaochen (Henry) Zhong · Andrew Ye · Zirui Liu · Kaixiong Zhou · Xia Hu · Shuai Xu · Vipin Chaudhary 🔗 |
-
|
Patch Gradient Descent: Training Neural Networks on Very Large Images
(
Poster
)
>
link
SlidesLive Video |
Deepak Gupta · Gowreesh Mago · Arnav Chavan · Dilip K. Prasad · Rajat Thomas 🔗 |
-
|
Batched Low-Rank Adaptation of Foundation Models ( Poster ) > link | Yeming Wen · Swarat Chaudhuri 🔗 |
-
|
Local LoRA: Memory-Efficient Fine-Tuning of Large Language Models ( Poster ) > link | Oscar Key · Jean Kaddour · Pasquale Minervini 🔗 |
-
|
Early Weight Averaging meets High Learning Rates for LLM Pre-training ( Poster ) > link | Sunny Sanyal · Atula Neerkaje · Jean Kaddour · Abhishek Kumar · Sujay Sanghavi 🔗 |
-
|
Bandit-Driven Batch Selection for Robust Learning under Label Noise
(
Poster
)
>
link
SlidesLive Video |
Michal Lisicki · Graham Taylor · Mihai Nica 🔗 |
-
|
Maestro: Uncovering Low-Rank Structures via Trainable Decomposition ( Poster ) > link | Samuel Horváth · Stefanos Laskaridis · Shashank Rajput · Hongyi Wang 🔗 |
-
|
Tiny Graph Convolutional Networks with Topologically Consistent Magnitude Pruning ( Poster ) > link | Hichem SAHBI 🔗 |
-
|
DONUT-hole: DONUT Sparsification by Harnessing Knowledge and Optimizing Learning Efficiency ( Poster ) > link | azhar shaikh · Michael Cochez · Denis Diachkov · Michiel de Rijcke · Sahar Yousefi 🔗 |
-
|
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning ( Poster ) > link | Mengzhou Xia · Tianyu Gao · Zhiyuan Zeng · Danqi Chen 🔗 |
-
|
A foundation for exact binarized morphological neural networks ( Poster ) > link | Theodore Aouad · Hugues Talbot 🔗 |
-
|
Training Bayesian Neural Networks with Sparse Subspace Variational Inference ( Poster ) > link | Junbo Li · Zichen Miao · Qiang Qiu · Ruqi Zhang 🔗 |
-
|
Task Arithmetic with LoRA for Continual Learning ( Poster ) > link | Rajas Chitale · Ankit Vaidya · Aditya Kane · Archana Ghotkar 🔗 |
-
|
Dynamic Observation Policies in Observation Cost-Sensitive Reinforcement Learning ( Poster ) > link | Colin Bellinger · Mark Crowley · Isaac Tamblyn 🔗 |
-
|
Cooperative Learning for Cost-Adaptive Inference ( Poster ) > link | Xingli Fang · Richard Bradford · Jung-Eun Kim 🔗 |
-
|
Generalisable Agents for Neural Network Optimisation ( Poster ) > link | Kale-ab Tessera · Callum R. Tilbury · Sasha Abramowitz · Ruan John de Kock · Omayma Mahjoub · Benjamin Rosman · Sara Hooker · Arnu Pretorius 🔗 |