Sat 6:50 a.m. - 7:00 a.m.
|
Opening Remarks
(
Opening Remarks
)
>
SlidesLive Video
|
🔗
|
Sat 7:00 a.m. - 7:45 a.m.
|
From algorithms to neural networks and back
(
Invited Talk
)
>
SlidesLive Video
|
Andrej Risteski
🔗
|
Sat 7:45 a.m. - 8:30 a.m.
|
How do two-layer neural networks learn complex functions from data over time?
(
Invited Talk
)
>
SlidesLive Video
|
Florent Krzakala
🔗
|
Sat 8:30 a.m. - 8:40 a.m.
|
Feature Learning in Infinite-Depth Neural Networks
(
Oral
)
>
link
SlidesLive Video
|
Greg Yang · Dingli Yu · Chen Zhu · Soufiane Hayou
🔗
|
Sat 8:40 a.m. - 8:50 a.m.
|
Fit Like You Sample: Sample-Efficient Score Matching From Fast Mixing Diffusions
(
Oral
)
>
link
SlidesLive Video
|
Yilong Qin · Andrej Risteski
🔗
|
Sat 8:50 a.m. - 9:00 a.m.
|
Deep Networks as Denoising Algorithms: Sample-Efficient Learning of Diffusion Models in High-Dimensional Graphical Models
(
Oral
)
>
link
SlidesLive Video
|
Song Mei · Yuchen Wu
🔗
|
Sat 9:00 a.m. - 9:10 a.m.
|
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data
(
Oral
)
>
link
SlidesLive Video
|
Zhiwei Xu · Yutong Wang · Spencer Frei · Gal Vardi · Wei Hu
🔗
|
Sat 9:10 a.m. - 10:10 a.m.
|
Poster Session
(
Poster Session
)
>
|
🔗
|
Sat 10:10 a.m. - 11:15 a.m.
|
Lunch Break
(
Lunch Break
)
>
|
🔗
|
Sat 11:15 a.m. - 12:00 p.m.
|
Benefits of learning with symmetries: eigenvectors, graph representations and sample complexity
(
Invited Talk
)
>
SlidesLive Video
|
Stefanie Jegelka
🔗
|
Sat 12:00 p.m. - 12:15 p.m.
|
Break
|
🔗
|
Sat 12:15 p.m. - 1:00 p.m.
|
Adaptivity in Domain Adaptation and Friends
(
Invited Talk
)
>
SlidesLive Video
|
Samory Kpotufe
🔗
|
Sat 1:00 p.m. - 1:10 p.m.
|
Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit
(
Oral
)
>
link
SlidesLive Video
|
Blake Bordelon · Lorenzo Noci · Mufan Li · Boris Hanin · Cengiz Pehlevan
🔗
|
Sat 1:10 p.m. - 1:20 p.m.
|
Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP
(
Oral
)
>
link
SlidesLive Video
|
Zixiang Chen · Yihe Deng · Yuanzhi Li · Quanquan Gu
🔗
|
Sat 1:20 p.m. - 1:30 p.m.
|
In-Context Convergence of Transformers
(
Oral
)
>
link
SlidesLive Video
|
Yu Huang · Yuan Cheng · Yingbin Liang
🔗
|
Sat 1:30 p.m. - 1:40 p.m.
|
Large Catapults in Momentum Gradient Descent with Warmup: An Empirical Study
(
Oral
)
>
link
SlidesLive Video
|
Prin Phunyaphibarn · Junghyun Lee · Bohan Wang · Huishuai Zhang · Chulhee Yun
🔗
|
Sat 1:40 p.m. - 1:50 p.m.
|
Linear attention is (maybe) all you need (to understand transformer optimization)
(
Oral
)
>
link
SlidesLive Video
|
Kwangjun Ahn · Xiang Cheng · Minhak Song · Chulhee Yun · Ali Jadbabaie · Suvrit Sra
🔗
|
Sat 1:50 p.m. - 2:00 p.m.
|
Closing Remarks
(
Closing Remarks
)
>
SlidesLive Video
|
🔗
|
Sat 2:00 p.m. - 3:00 p.m.
|
Poster Session
(
Poster Session
)
>
|
🔗
|
-
|
A PAC-Bayesian Perspective on the Interpolating Information Criterion
(
Poster
)
>
link
|
Liam Hodgkinson · Chris van der Heide · Robert Salomone · Fred Roosta · Michael Mahoney
🔗
|
-
|
Graph Neural Networks Benefit from Structural Information Provably: A Feature Learning Perspective
(
Poster
)
>
link
|
Wei Huang · Yuan Cao · Haonan Wang · Xin Cao · Taiji Suzuki
🔗
|
-
|
Linear attention is (maybe) all you need (to understand transformer optimization)
(
Poster
)
>
link
|
Kwangjun Ahn · Xiang Cheng · Minhak Song · Chulhee Yun · Ali Jadbabaie · Suvrit Sra
🔗
|
-
|
Large Catapults in Momentum Gradient Descent with Warmup: An Empirical Study
(
Poster
)
>
link
|
Prin Phunyaphibarn · Junghyun Lee · Bohan Wang · Huishuai Zhang · Chulhee Yun
🔗
|
-
|
Feature Learning in Infinite-Depth Neural Networks
(
Poster
)
>
link
|
Greg Yang · Dingli Yu · Chen Zhu · Soufiane Hayou
🔗
|
-
|
Variational Classification
(
Poster
)
>
link
|
Shehzaad Dhuliawala · Mrinmaya Sachan · Carl Allen
🔗
|
-
|
Implicit biases in multitask and continual learningfrom a backward error analysis perspective
(
Poster
)
>
link
|
Benoit Dherin
🔗
|
-
|
Spectrum Extraction and Clipping for Implicitly Linear Layers
(
Poster
)
>
link
|
Ali Ebrahimpour-Boroojeny · Matus Telgarsky · Hari Sundaram
🔗
|
-
|
The Noise Geometry of Stochastic Gradient Descent: A Quantitative and Analytical Characterization
(
Poster
)
>
link
|
Mingze Wang · Lei Wu
🔗
|
-
|
Curvature-Dimension Tradeoff for Generalization in Hyperbolic Space
(
Poster
)
>
link
|
Nicolás Alvarado · Hans Lobel · Mircea Petrache
🔗
|
-
|
Complexity Matters: Dynamics of Feature Learning in the Presence of Spurious Correlations
(
Poster
)
>
link
|
GuanWen Qiu · Da Kuang · Surbhi Goel
🔗
|
-
|
Unveiling the Hessian's Connection to the Decision Boundary
(
Poster
)
>
link
|
Mahalakshmi Sabanayagam · Freya Behrens · Urte Adomaityte · Anna Dawid
🔗
|
-
|
Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks
(
Poster
)
>
link
|
Zixuan Zhang · Kaiqi Zhang · Minshuo Chen · Yuma Takeda · Mengdi Wang · Tuo Zhao · Yu-Xiang Wang
🔗
|
-
|
Large Learning Rates Improve Generalization: But How Large Are We Talking About?
(
Poster
)
>
link
|
Ekaterina Lobacheva · Eduard Pokonechny · Maxim Kodryan · Dmitry Vetrov
🔗
|
-
|
Understanding the Role of Noisy Statistics in the Regularization Effect of Batch Normalization
(
Poster
)
>
link
|
Atli Kosson · Dongyang Fan · Martin Jaggi
🔗
|
-
|
Generalization Guarantees of Deep ResNets in the Mean-Field Regime
(
Poster
)
>
link
|
Yihang Chen · Fanghui Liu · Yiping Lu · Grigorios Chrysos · Volkan Cevher
🔗
|
-
|
Theoretical Explanation for Generalization from Adversarial Perturbations
(
Poster
)
>
link
|
Soichiro Kumano · Hiroshi Kera · Toshihiko Yamasaki
🔗
|
-
|
In-Context Convergence of Transformers
(
Poster
)
>
link
|
Yu Huang · Yuan Cheng · Yingbin Liang
🔗
|
-
|
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time
(
Poster
)
>
link
|
Yatin Dandi · Florent Krzakala · Bruno Loureiro · Luca Pesce · Ludovic Stephan
🔗
|
-
|
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States
(
Poster
)
>
link
|
Ziqiao Wang · Yongyi Mao
🔗
|
-
|
Unraveling the Complexities of Simplicity Bias: Mitigating and Amplifying Factors
(
Poster
)
>
link
|
Xuchen Gong · Tianwen Fu
🔗
|
-
|
Transformers as Support Vector Machines
(
Poster
)
>
link
|
Davoud Ataee Tarzanagh · Yingcong Li · Christos Thrampoulidis · Samet Oymak
🔗
|
-
|
Symmetric Mean-field Langevin Dynamics for Distributional Minimax Problems
(
Poster
)
>
link
|
Juno Kim · Kakei Yamamoto · Kazusato Oko · Zhuoran Yang · Taiji Suzuki
🔗
|
-
|
A Theoretical Study of Dataset Distillation
(
Poster
)
>
link
|
Zachary Izzo · James Zou
🔗
|
-
|
Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models
(
Poster
)
>
link
|
Deqing Fu · Tian-qi Chen · Robin Jia · Vatsal Sharan
🔗
|
-
|
Introducing an Improved Information-Theoretic Measure of Predictive Uncertainty
(
Poster
)
>
link
|
Kajetan Schweighofer · Lukas Aichberger · Mykyta Ielanskyi · Sepp Hochreiter
🔗
|
-
|
In-Context Learning on Unstructured Data: Softmax Attention as a Mixture of Experts
(
Poster
)
>
link
|
Kevin Christian Wibisono · Yixin Wang
🔗
|
-
|
Attention-Only Transformers and Implementing MLPs with Attention Heads
(
Poster
)
>
link
|
Robert Huben · Valerie Morris
🔗
|
-
|
Privacy at Interpolation: Precise Analysis for Random and NTK Features
(
Poster
)
>
link
|
Simone Bombari · Marco Mondelli
🔗
|
-
|
Denoising Low-Rank Data Under Distribution Shift: Double Descent and Data Augmentation
(
Poster
)
>
link
|
Chinmaya Kausik · Kashvi Srivastava · Rishi Sonthalia
🔗
|
-
|
A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks
(
Poster
)
>
link
|
Behrad Moniri · Donghwan Lee · Hamed Hassani · Edgar Dobriban
🔗
|
-
|
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data
(
Poster
)
>
link
|
Zhiwei Xu · Yutong Wang · Spencer Frei · Gal Vardi · Wei Hu
🔗
|
-
|
How does Gradient Descent Learn Features --- A Local Analysis for Regularized Two-Layer Neural Networks
(
Poster
)
>
link
|
Mo Zhou · Rong Ge
🔗
|
-
|
Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP
(
Poster
)
>
link
|
Zixiang Chen · Yihe Deng · Yuanzhi Li · Quanquan Gu
🔗
|
-
|
Provably Efficient CVaR RL in Low-rank MDPs
(
Poster
)
>
link
|
Yulai Zhao · Wenhao Zhan · Xiaoyan Hu · Ho-fung Leung · Farzan Farnia · Wen Sun · Jason Lee
🔗
|
-
|
Analysis of Task Transferability in Large Pre-trained Classifiers
(
Poster
)
>
link
|
Akshay Mehra · Yunbei Zhang · Jihun Hamm
🔗
|
-
|
On Scale-Invariant Sharpness Measures
(
Poster
)
>
link
|
Behrooz Tahmasebi · Ashkan Soleymani · Stefanie Jegelka · Patrick Jaillet
🔗
|
-
|
Gibbs-Based Information Criteria and the Over-Parameterized Regime
(
Poster
)
>
link
|
Haobo Chen · Yuheng Bu · Gregory Wornell
🔗
|
-
|
Grokking modular arithmetic can be explained by margin maximization
(
Poster
)
>
link
|
Mohamad Amin Mohamadi · Zhiyuan Li · Lei Wu · Danica J. Sutherland
🔗
|
-
|
Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: \\ Global Convergence Guarantees and Feature Learning
(
Poster
)
>
link
|
Fadhel Ayed · Francois Caron · Paul Jung · Juho Lee · Hoil Lee · Hongseok Yang
🔗
|
-
|
On the Computational Complexity of Inverting Generative Models
(
Poster
)
>
link
|
Feyza Duman Keles · Chinmay Hegde
🔗
|
-
|
Flow-Based High-Dimensionally Distributional Robust Optimization
(
Poster
)
>
link
|
Chen Xu · Jonghyeok Lee · Xiuyuan Cheng · Yao Xie
🔗
|
-
|
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining
(
Poster
)
>
link
|
Licong Lin · Yu Bai · Song Mei
🔗
|
-
|
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations
(
Poster
)
>
link
|
Tianyu Guo · Wei Hu · Song Mei · Huan Wang · Caiming Xiong · Silvio Savarese · Yu Bai
🔗
|
-
|
A Theoretical Explanation of Deep RL Performance in Stochastic Environments
(
Poster
)
>
link
|
Cassidy Laidlaw · Banghua Zhu · Stuart J Russell · Anca Dragan
🔗
|
-
|
Deep Networks as Denoising Algorithms: Sample-Efficient Learning of Diffusion Models in High-Dimensional Graphical Models
(
Poster
)
>
link
|
Song Mei · Yuchen Wu
🔗
|
-
|
Under-Parameterized Double Descent for Ridge Regularized Least Squares Denoising of Data on a Line
(
Poster
)
>
link
|
Rishi Sonthalia · Xinyue (Serena) Li · Bochao Gu
🔗
|
-
|
Continual Learning for Long-Tailed Recognition: Bridging the Gap in Theory and Practice
(
Poster
)
>
link
|
Mahdiyar Molahasani · Ali Etemad · Michael Greenspan
🔗
|
-
|
SimVAE: Narrowing the gap between Discriminative & Generative Representation Learning
(
Poster
)
>
link
|
Alice Bizeul · Carl Allen
🔗
|
-
|
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
(
Poster
)
>
link
|
Atli Kosson · Bettina Messmer · Martin Jaggi
🔗
|
-
|
Benign Oscillation of Stochastic Gradient Descent with Large Learning Rate
(
Poster
)
>
link
|
Miao Lu · Beining Wu · Xiaodong Yang · Difan Zou
🔗
|
-
|
On Compositionality and Emergence in Physical Systems Generativie Modeling
(
Poster
)
>
link
|
Justin Diamond
🔗
|
-
|
Escaping Random Teacher Initialization Enhances Signal Propagation and Representations
(
Poster
)
>
link
|
Felix Sarnthein · Sidak Pal Singh · Antonio Orvieto · Thomas Hofmann
🔗
|
-
|
The Expressive Power of Transformers with Chain of Thought
(
Poster
)
>
link
|
William Merrill · Ashish Sabharwal
🔗
|
-
|
Transformers as Multi-Task Feature Selectors: Generalization Analysis of In-Context Learning
(
Poster
)
>
link
|
Hongkang Li · Meng Wang · Songtao Lu · Hui Wan · Xiaodong Cui · Pin-Yu Chen
🔗
|
-
|
Fit Like You Sample: Sample-Efficient Score Matching From Fast Mixing Diffusions
(
Poster
)
>
link
|
Yilong Qin · Andrej Risteski
🔗
|
-
|
Towards the Fundamental Limits of Knowledge Transfer over Finite Domains
(
Poster
)
>
link
|
Qingyue Zhao · Banghua Zhu
🔗
|
-
|
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization
(
Poster
)
>
link
|
Elan Rosenfeld · Andrej Risteski
🔗
|
-
|
MoXCo:How I learned to stop exploring and love my local minima?
(
Poster
)
>
link
|
Esha Singh · Shoham Sabach · Yu-Xiang Wang
🔗
|
-
|
First-order ANIL provably learns representations despite overparametrisation
(
Poster
)
>
link
|
Oguz Kaan Yuksel · Etienne Boursier · Nicolas Flammarion
🔗
|
-
|
A Data-Driven Measure of Relative Uncertainty for Misclassification Detection
(
Poster
)
>
link
|
Eduardo Dadalto Câmara Gomes · Marco Romanelli · Georg Pichler · Pablo Piantanida
🔗
|
-
|
Non-Vacuous Generalization Bounds for Large Language Models
(
Poster
)
>
link
|
Sanae Lotfi · Marc Finzi · Yilun Kuang · Tim G. J. Rudner · Micah Goldblum · Andrew Wilson
🔗
|
-
|
Learning from setbacks: the impact of adversarial initialization on generalization performance
(
Poster
)
>
link
|
Yatin Dandi · Stefani Karp · Francesca Mignacco · Kavya Ravichandran
🔗
|
-
|
Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit
(
Poster
)
>
link
|
Blake Bordelon · Lorenzo Noci · Mufan Li · Boris Hanin · Cengiz Pehlevan
🔗
|
-
|
Estimating optimal PAC-Bayes bounds with Hamiltonian Monte Carlo
(
Poster
)
>
link
|
Szilvia Ujváry · Gergely Flamich · Vincent Fortuin · José Miguel Hernández-Lobato
🔗
|
-
|
Divergence at the Interpolation Threshold: Identifying, Interpreting \& Ablating the Sources of a Deep Learning Puzzle
(
Poster
)
>
link
|
Rylan Schaeffer · Zachary Robertson · Akhilan Boopathy · Mikail Khona · Ila Fiete · Andrey Gromov · Sanmi Koyejo
🔗
|
-
|
Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult
(
Poster
)
>
link
|
Yuqing Wang · Zhenghao Xu · Tuo Zhao · Molei Tao
🔗
|
-
|
Toward Student-oriented Teacher Network Training for Knowledge Distillation
(
Poster
)
>
link
|
Chengyu Dong · Liyuan Liu · Jingbo Shang
🔗
|
-
|
Adaptive Sharpness-Aware Pruning for Robust Sparse Networks
(
Poster
)
>
link
|
Anna Bair · Hongxu Yin · Maying Shen · Pavlo Molchanov · Jose M. Alvarez
🔗
|
-
|
Invariant Low-Dimensional Subspaces in Gradient Descent for Learning Deep Matrix Factorizations
(
Poster
)
>
link
|
Can Yaras · Peng Wang · Wei Hu · Zhihui Zhu · Laura Balzano · Qing Qu
🔗
|
-
|
How Structured Data Guides Feature Learning: A Case Study of the Parity Problem
(
Poster
)
>
link
|
Atsushi Nitanda · Kazusato Oko · Taiji Suzuki · Denny Wu
🔗
|
-
|
The Next Symbol Prediction Problem: PAC-learning and its relation to Language Models
(
Poster
)
>
link
|
Satwik Bhattamishra · Phil Blunsom · Varun Kanade
🔗
|
-
|
Why Do We Need Weight Decay for Overparameterized Deep Networks?
(
Poster
)
>
link
|
Maksym Andriushchenko · Francesco D'Angelo · Aditya Vardhan Varre · Nicolas Flammarion
🔗
|
-
|
The Double-Edged Sword: Perception and Uncertainty in Inverse Problems
(
Poster
)
>
link
|
Regev Cohen · Ehud Rivlin · Daniel Freedman
🔗
|
-
|
Near-Interpolators: Fast Norm Growth and Tempered Near-Overfitting
(
Poster
)
>
link
|
Yutong Wang · Rishi Sonthalia · Wei Hu
🔗
|
-
|
On robust overfitting: adversarial training induced distribution matters
(
Poster
)
>
link
|
Runzhi Tian · Yongyi Mao
🔗
|
-
|
Are Graph Neural Networks Optimal Approximation Algorithms?
(
Poster
)
>
link
|
Morris Yau · Eric Lu · Nikolaos Karalias · Jessica Xu · Stefanie Jegelka
🔗
|
-
|
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
(
Poster
)
>
link
|
Yuandong Tian · Yiping Wang · Zhenyu Zhang · Beidi Chen · Simon Du
🔗
|