Sat 8:50 a.m. - 9:00 a.m.
|
Opening Remarks
SlidesLive Video
|
🔗
|
Sat 9:00 a.m. - 9:45 a.m.
|
Flat Minima and Generalization: from Matrix Sensing to Neural Networks
(
Invited Talk
)
>
SlidesLive Video
|
Maryam Fazel
🔗
|
Sat 9:45 a.m. - 10:30 a.m.
|
A Theoretical Perspective on Hardness of Sampling and Learning from Samples in High Dimensions
(
Invited Talk
)
>
SlidesLive Video
|
Lenka Zdeborová
🔗
|
Sat 10:30 a.m. - 10:45 a.m.
|
Classifier-Free Guidance is a Predictor-Corrector
(
Oral
)
>
link
SlidesLive Video
|
Arwen Bradley · Preetum Nakkiran
🔗
|
Sat 10:45 a.m. - 11:00 a.m.
|
Towards characterizing the value of edge embeddings in Graph Neural Networks
(
Oral
)
>
link
SlidesLive Video
|
Dhruv Rohatgi · Tanya Marwah · Zachary Lipton · Jianfeng Lu · Ankur Moitra · Andrej Risteski
🔗
|
Sat 11:00 a.m. - 11:15 a.m.
|
Can Neural Networks Achieve Optimal Computational-statistical Tradeoff? An Analysis on Single-Index Model
(
Oral
)
>
link
SlidesLive Video
|
Siyu Chen · Beining Wu · Miao Lu · Zhuoran Yang · Tianhao Wang
🔗
|
Sat 11:15 a.m. - 12:15 p.m.
|
Poster Session 1
(
Poster Session
)
>
|
🔗
|
Sat 12:15 p.m. - 1:30 p.m.
|
Lunch Break
|
🔗
|
Sat 1:30 p.m. - 2:15 p.m.
|
Scaling Deep Learning Optimization: Insights into Efficiency, Preconditioning, and Critical Batch Sizes
(
Invited Talk
)
>
SlidesLive Video
|
Sham Kakade
🔗
|
Sat 2:15 p.m. - 3:00 p.m.
|
Open problems in LLM Theory, DL theory, and the role of theory
(
Invited Talk
)
>
SlidesLive Video
|
Matus Telgarsky
🔗
|
Sat 3:00 p.m. - 3:15 p.m.
|
Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues
(
Oral
)
>
link
SlidesLive Video
|
Riccardo Grazzi · Julien Siems · Jörg Franke · Arber Zela · Frank Hutter · Massimiliano Pontil
🔗
|
Sat 3:15 p.m. - 3:30 p.m.
|
Understanding Factual Recall in Transformers via Associative Memories
(
Oral
)
>
link
SlidesLive Video
|
Eshaan Nichani · Jason Lee · Alberto Bietti
🔗
|
Sat 3:30 p.m. - 3:45 p.m.
|
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
(
Oral
)
>
link
SlidesLive Video
|
Tianyu Guo · Druv Pai · Yu Bai · Jiantao Jiao · Michael Jordan · Song Mei
🔗
|
Sat 3:45 p.m. - 4:00 p.m.
|
Mixture of Parrots: Mixtures of experts improve memorization more than reasoning
(
Oral
)
>
link
SlidesLive Video
|
Samy Jelassi · Clara Mohri · David Brandfonbrener · Alex Gu · Nikhil Vyas · Nikhil Anand · David Alvarez-Melis · Yuanzhi Li · Sham Kakade · Eran Malach
🔗
|
Sat 4:00 p.m. - 5:00 p.m.
|
Poster Session 2
(
Poster Session
)
>
|
🔗
|
-
|
Does Machine Bring in Extra Bias in Learning? Approximating Discrimination Within Models Quickly
(
Poster
)
>
link
|
Yijun Bian · Yujie Luo · Ping Xu
🔗
|
-
|
On the Implicit Relation between Low-Rank Adaptation and Differential Privacy
(
Poster
)
>
link
|
Saber Malekmohammadi · Golnoosh Farnadi
🔗
|
-
|
Self-Improvement in Language Models: The Sharpening Mechanism
(
Poster
)
>
link
|
Audrey Huang · Adam Block · Dylan J Foster · Dhruv Rohatgi · Cyril Zhang · Max Simchowitz · Jordan Ash · Akshay Krishnamurthy
🔗
|
-
|
SGD and Weight Decay Secretly Minimize the Rank of Your Neural Network
(
Poster
)
>
link
|
Tomer Galanti · Zachary Siegel · Aparna Gupte · Tomaso Poggio
🔗
|
-
|
Information-Theoretic Generalization Bounds for Batch Reinforcement Learning
(
Poster
)
>
link
|
Xingtu Liu
🔗
|
-
|
Emergence in non-neural models: grokking modular arithmetic via average gradient outer product
(
Poster
)
>
link
|
Neil Mallinar · Daniel Beaglehole · Libin Zhu · Adityanarayanan Radhakrishnan · Parthe Pandit · Misha Belkin
🔗
|
-
|
Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data
(
Poster
)
>
link
|
Binghui Li · Yuanzhi Li
🔗
|
-
|
Depth Extrapolation of Decoders Trained on Nested Structures
(
Poster
)
>
link
|
Emile Richard
🔗
|
-
|
Diffusion Model Learns Low-Dimensional Distributions via Subspace Clustering
(
Poster
)
>
link
|
Peng Wang · Huijie Zhang · Zekai Zhang · Siyi Chen · Yi Ma · Qing Qu
🔗
|
-
|
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
(
Poster
)
>
link
|
Dayal Singh Kalra · Tianyu He · Maissam Barkeshli
🔗
|
-
|
Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression
(
Poster
)
>
link
|
Juno Kim · Dimitri Meunier · Arthur Gretton · Taiji Suzuki · Zhu Li
🔗
|
-
|
How do students become teachers: A dynamical analysis for two-layer neural networks
(
Poster
)
>
link
|
Zhenyu Zhu · Fanghui Liu · Volkan Cevher
🔗
|
-
|
Optimizing Attention with Mirror Descent: Generalized Max-Margin Token Selection
(
Poster
)
>
link
|
Aaron Alvarado Kristanto Julistiono · Davoud Ataee Tarzanagh · Navid Azizan
🔗
|
-
|
Bayesian Treatment of the Spectrum of the Empirical Kernel in (Sub)Linear-Width Neural Networks
(
Poster
)
>
link
|
Ouns El Harzli · Bernardo Grau
🔗
|
-
|
Convergence of Distributed Adaptive Optimization with Local Updates
(
Poster
)
>
link
|
Ziheng Cheng · Margalit Glasgow
🔗
|
-
|
Progressive distillation induces an implicit curriculum
(
Poster
)
>
link
|
Abhishek Panigrahi · Bingbin Liu · Sadhika Malladi · Andrej Risteski · Surbhi Goel
🔗
|
-
|
Comparing Implicit and Denoising Score-Matching Objectives
(
Poster
)
>
link
|
Artem Artemev · Ayan Das · Farhang Nabiei · Alberto Bernacchia
🔗
|
-
|
Understanding Diffusion-based Representation Learning via Low-Dimensional Modeling
(
Poster
)
>
link
|
Xiao Li · Zekai Zhang · Xiang Li · Siyi Chen · Zhihui Zhu · Peng Wang · Qing Qu
🔗
|
-
|
Benign Overfitting in Single-Head Attention
(
Poster
)
>
link
|
Roey Magen · Shuning Shang · Zhiwei Xu · Spencer Frei · Wei Hu · Gal Vardi
🔗
|
-
|
The GAN is dead; long live the GAN! A Modern GAN Baseline
(
Poster
)
>
link
|
Nick Huang · Aaron Gokaslan · Volodymyr Kuleshov · James Tompkin
🔗
|
-
|
Information-Theoretic Foundations for Neural Scaling Laws
(
Poster
)
>
link
|
Hong Jun Jeon · Benjamin Van Roy
🔗
|
-
|
Bias in Motion: Theoretical Insights into the Dynamics of Bias in SGD Training
(
Poster
)
>
link
|
Anchit Jain · Rozhin Nobahari · Aristide Baratin · Stefano Sarao Mannelli
🔗
|
-
|
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
(
Poster
)
>
link
|
Will Merrill · Ashish Sabharwal
🔗
|
-
|
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
(
Poster
)
>
link
|
Frederik Kunstner · Robin Yadav · Alan Milligan · Mark Schmidt · Alberto Bietti
🔗
|
-
|
Provable weak-to-strong generalization via benign overfitting
(
Poster
)
>
link
|
David Wu · Anant Sahai
🔗
|
-
|
On Your Mark, Get Set, Warmup!
(
Poster
)
>
link
|
Dayal Singh Kalra · Maissam Barkeshli
🔗
|
-
|
Continuous-Time Analysis of Adaptive Optimization and Normalization
(
Poster
)
>
link
|
Rhys Gould · Hidenori Tanaka
🔗
|
-
|
Commute Your Domains: Trajectory Optimality Criterion for Multi-Domain Learning
(
Poster
)
>
link
|
Alexey Rukhovich · Alexander Podolskiy · Irina Piontkovskaya
🔗
|
-
|
Transformers are Efficient Compilers, Provably
(
Poster
)
>
link
|
Xiyu Zhai · Runlong Zhou · Liao Zhang · Simon Du
🔗
|
-
|
Towards the Effect of Examples on In-Context Learning: A Theoretical Case Study
(
Poster
)
>
link
|
Pengfei He · Yingqian Cui · Han Xu · Hui Liu · Makoto Yamada · Jiliang Tang · Yue XING
🔗
|
-
|
Towards characterizing the value of edge embeddings in Graph Neural Networks
(
Poster
)
>
link
|
Dhruv Rohatgi · Tanya Marwah · Zachary Lipton · Jianfeng Lu · Ankur Moitra · Andrej Risteski
🔗
|
-
|
Optimizing Fine-Tuning Efficiency: Gradient Subspace Tracking on Grassmann Manifolds for Large Language Models
(
Poster
)
>
link
|
Sahar Rajabi · Sirisha Rambhatla
🔗
|
-
|
Benign Overfitting in Out-of-Distribution Generalization of Linear Models
(
Poster
)
>
link
|
Shange Tang · Jiayun Wu · Jianqing Fan · Chi Jin
🔗
|
-
|
Dynamics of Concept Learning and Compositional Generalization
(
Poster
)
>
link
|
Yongyi Yang · Core Francisco Park · Ekdeep S Lubana · Maya Okawa · Wei Hu · Hidenori Tanaka
🔗
|
-
|
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs
(
Poster
)
>
link
|
Jonas Hübotter · Sascha Bongni · Ido Hakimi · Andreas Krause
🔗
|
-
|
Declarative characterizations of direct preference alignment algorithms
(
Poster
)
>
link
|
Kyle Richardson · Vivek Srikumar · Ashish Sabharwal
🔗
|
-
|
Composing Global Optimizers to Reasoning Tasks via Algebraic Objects in Neural Nets
(
Poster
)
>
link
|
Yuandong Tian
🔗
|
-
|
Adversarial Attacks as Near-Zero Eigenvalues in the Empirical Kernel of Neural Networks
(
Poster
)
>
link
|
Ouns El Harzli · Bernardo Grau
🔗
|
-
|
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
(
Poster
)
>
link
|
Clémentine Dominé · Nicolas Anguita · Alexandra Proca · Lukas Braun · Daniel Kunin · Pedro A.M Mediano · Andrew Saxe
🔗
|
-
|
Geometric Deep Learning with Quasiconformal Neural Networks: An Introduction
(
Poster
)
>
link
|
Nico Alvarado · Hans Lobel
🔗
|
-
|
Sample compression unleashed : New generalization bounds for real valued losses
(
Poster
)
>
link
|
Mathieu Bazinet · Valentina Zantedeschi · Pascal Germain
🔗
|
-
|
Increasing Fairness via Combination with Learning Guarantees
(
Poster
)
>
link
|
Yijun Bian · Kun Zhang
🔗
|
-
|
Simple and Effective Masked Diffusion Language Models
(
Poster
)
>
link
|
Subham Sahoo · Marianne Arriola · Aaron Gokaslan · Yair Schiff · Edgar Marroquin · Justin Chiu · Alexander Rush · Volodymyr Kuleshov
🔗
|
-
|
Convergence Properties of Hyperbolic Neural Networks on Riemannian Manifolds
(
Poster
)
>
link
|
Nico Alvarado · Sebastian Burgos
🔗
|
-
|
Understanding Factual Recall in Transformers via Associative Memories
(
Poster
)
>
link
|
Eshaan Nichani · Jason Lee · Alberto Bietti
🔗
|
-
|
Leveraging Intermediate Neural Collapse: Fixing Layers Beyond Effective Depth to Simplex ETFs for Efficient Deep Neural Networks
(
Poster
)
>
link
|
Emily Liu
🔗
|
-
|
A Theory of Initialisation's Impact on Specialisation
(
Poster
)
>
link
|
Devon Jarvis · Sebastian Lee · Clémentine Dominé · Andrew Saxe · Stefano Sarao Mannelli
🔗
|
-
|
An empirical study of the $(L_0, L_1)$-smoothness condition
(
Poster
)
>
link
|
Y Cooper
🔗
|
-
|
Diffusion Models With Learned Adaptive Noise Processes
(
Poster
)
>
link
|
Subham Sahoo · Aaron Gokaslan · Christopher De Sa · Volodymyr Kuleshov
🔗
|
-
|
Harnessing the Power of Vicinity-Informed Analysis for Classification under Covariate Shift
(
Poster
)
>
link
|
Mitsuhiro Fujikawa · Youhei Akimoto · Jun Sakuma · Kazuto Fukuchi
🔗
|
-
|
A Theoretical Framework for Federated Domain Generalization with Gradient Alignment
(
Poster
)
>
link
|
Mahdiyar Molahasani · Milad Soltany · Farhad Pourpanah · Michael Greenspan · Ali Etemad
🔗
|
-
|
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
(
Poster
)
>
link
|
Tianyu Guo · Druv Pai · Yu Bai · Jiantao Jiao · Michael Jordan · Song Mei
🔗
|
-
|
In-Context Learning by Linear Attention: Exact Asymptotics and Experiments
(
Poster
)
>
link
|
Yue Lu · Mary Letey · Jacob Zavatone-Veth · Anindita Maiti · Cengiz Pehlevan
🔗
|
-
|
The Crucial Role of Samplers in Online Direct Preference Optimization
(
Poster
)
>
link
|
Ruizhe Shi · Runlong Zhou · Simon Du
🔗
|
-
|
Complexity of Vector-valued Prediction: From Linear Models to Stochastic Convex Optimization
(
Poster
)
>
link
|
Matan Schliserman · Tomer Koren
🔗
|
-
|
Misspecified $Q$ -Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error
(
Poster
)
>
link
|
Ally Du · Lin Yang · Ruosong Wang
🔗
|
-
|
Exploring Task Affinities through NTK Alignment and Early Training Dynamics in Multi-Task Learning
(
Poster
)
>
link
|
Yoann Morello · Emilie Grégoire · Sam Verboven
🔗
|
-
|
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
(
Poster
)
>
link
|
Yuda Song · Hanlin Zhang · Udaya Ghai · Carson Eisenach · Sham Kakade · Dean Foster
🔗
|
-
|
Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues
(
Poster
)
>
link
|
Riccardo Grazzi · Julien Siems · Jörg Franke · Arber Zela · Frank Hutter · Massimiliano Pontil
🔗
|
-
|
Transformers Provably Solve Parity Efficiently with Chain of Thought
(
Poster
)
>
link
|
Juno Kim · Taiji Suzuki
🔗
|
-
|
Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers
(
Poster
)
>
link
|
Yibo Jiang · Goutham Rajendran · Pradeep Ravikumar · Bryon Aragam
🔗
|
-
|
A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules
(
Poster
)
>
link
|
Kairong Luo · Haodong Wen · Shengding Hu · Zhenbo Sun · Zhiyuan Liu · Maosong Sun · Kaifeng Lyu · Wenguang Chen
🔗
|
-
|
Algorithmic Stability of Minimum-Norm Interpolating Deep Neural Networks
(
Poster
)
>
link
|
Ouns El Harzli · yoonsoo nam · Ilja Kuzborskij · Bernardo Grau · Ard Louis
🔗
|
-
|
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
(
Poster
)
>
link
|
Noam Razin · Sadhika Malladi · Adithya Bhaskar · Danqi Chen · Sanjeev Arora · Boris Hanin
🔗
|
-
|
Can Bayesian Neural Networks Make Confident Predictions?
(
Poster
)
>
link
|
Katharine Fisher
🔗
|
-
|
Provable unlearning in topic modeling and downstream tasks
(
Poster
)
>
link
|
Stanley Wei · Sadhika Malladi · Sanjeev Arora · Amartya Sanyal
🔗
|
-
|
Implicit Bias of Adam versus Gradient Descent in One-Hidden-Layer Neural Networks
(
Poster
)
>
link
|
Bhavya Vasudeva · Vatsal Sharan · Mahdi Soltanolkotabi
🔗
|
-
|
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
(
Poster
)
>
link
|
Kaiyue Wen · Huaqing Zhang · Hongzhou Lin · Jingzhao Zhang
🔗
|
-
|
HERTA: A High-Efficiency and Rigorous Training Algorithm for Unfolded Graph Neural Networks
(
Poster
)
>
link
|
Yongyi Yang · Jiaming Yang · Wei Hu · Michal Derezinski
🔗
|
-
|
Parameter Symmetry and Emergence of Noise Equilibrium in Stochastic Training
(
Poster
)
>
link
|
Liu Ziyin · Mingze Wang · Hongchao Li · Lei Wu
🔗
|
-
|
Improving the Gaussian Approximation in Neural Networks: Para-Gaussians and Edgeworth Expansions
(
Poster
)
>
link
|
Mihai Nica · Janosch Ortmann
🔗
|
-
|
Mixture of Parrots: Mixtures of experts improve memorization more than reasoning
(
Poster
)
>
link
|
Samy Jelassi · Clara Mohri · David Brandfonbrener · Alex Gu · Nikhil Vyas · Nikhil Anand · David Alvarez-Melis · Yuanzhi Li · Sham Kakade · Eran Malach
🔗
|
-
|
Can Neural Networks Achieve Optimal Computational-statistical Tradeoff? An Analysis on Single-Index Model
(
Poster
)
>
link
|
Siyu Chen · Beining Wu · Miao Lu · Zhuoran Yang · Tianhao Wang
🔗
|
-
|
Label Noise: Ignorance Is Bliss
(
Poster
)
>
link
|
Yilun Zhu · Jianxin Zhang · Aditya Gangrade · Clay Scott
🔗
|
-
|
Optimal Protocols for Continual Learning via Statistical Physics and Control Theory
(
Poster
)
>
link
|
Francesco Mori · Stefano Sarao Mannelli · Francesca Mignacco
🔗
|
-
|
How Discrete and Continuous Diffusion Meet: Comprehensive Analysis of Discrete Diffusion Models via a Stochastic Integral Framework
(
Poster
)
>
link
|
Yinuo Ren · Haoxuan Chen · Grant Rotskoff · Lexing Ying
🔗
|
-
|
Accumulating Data Avoids Model Collapse
(
Poster
)
>
link
|
Joshua Kazdan · Apratim Dey · Rylan Schaeffer · Matthias Gerstgrasser · Rafael Rafailov · David Donoho · Sanmi Koyejo
🔗
|
-
|
Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks
(
Poster
)
>
link
|
Nikolaos Tsilivis · Gal Vardi · Julia Kempe
🔗
|
-
|
Robust Feature Learning for Multi-Index Models in High Dimensions
(
Poster
)
>
link
|
Alireza Mousavi-Hosseini · Adel Javanmard · Murat Erdogdu
🔗
|
-
|
Classifier-Free Guidance is a Predictor-Corrector
(
Poster
)
>
link
|
Arwen Bradley · Preetum Nakkiran
🔗
|
-
|
Towards Principled Graph Transformers
(
Poster
)
>
link
|
Luis Müller · Daniel Kusuma · Blai Bonet · Christopher Morris
🔗
|