Posters in this session:
Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: Global Convergence Guarantees and Feature Learning
On the Computational Complexity of Inverting Generative Models
Flow-Based High-Dimensionally Distributional Robust Optimization
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations
A Theoretical Explanation of Deep RL Performance in Stochastic Environments
Deep Networks as Denoising Algorithms: Sample-Efficient Learning of Diffusion Models in High-Dimensional Graphical Models
Under-Parameterized Double Descent for Ridge Regularized Least Squares Denoising of Data on a Line
Continual Learning for Long-Tailed Recognition: Bridging the Gap in Theory and Practice
SimVAE: Narrowing the gap between Discriminative & Generative Representation Learning
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
Benign Oscillation of Stochastic Gradient Descent with Large Learning Rate
On Compositionality and Emergence in Physical Systems Generativie Modeling
Escaping Random Teacher Initialization Enhances Signal Propagation and Representations
The Expressive Power of Transformers with Chain of Thought
Transformers as Multi-Task Feature Selectors: Generalization Analysis of In-Context Learning
Fit Like You Sample: Sample-Efficient Score Matching From Fast Mixing Diffusions
Towards the Fundamental Limits of Knowledge Transfer over Finite Domains
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization
MoXCo:How I learned to stop exploring and love my local minima?
First-order ANIL provably learns representations despite overparametrisation
A Data-Driven Measure of Relative Uncertainty for Misclassification Detection
Non-Vacuous Generalization Bounds for Large Language Models
Learning from setbacks: the impact of adversarial initialization on generalization performance
Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit
Estimating optimal PAC-Bayes bounds with Hamiltonian Monte Carlo
Divergence at the Interpolation Threshold: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle
Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult
Toward Student-oriented Teacher Network Training for Knowledge Distillation
Adaptive Sharpness-Aware Pruning for Robust Sparse Networks
Invariant Low-Dimensional Subspaces in Gradient Descent for Learning Deep Matrix Factorizations
How Structured Data Guides Feature Learning: A Case Study of the Parity Problem
The Next Symbol Prediction Problem: PAC-learning and its relation to Language Mode
Why Do We Need Weight Decay for Overparameterized Deep Networks?
The Double-Edged Sword: Perception and Uncertainty in Inverse Problems ls
Near-Interpolators: Fast Norm Growth and Tempered Near-Overfitting
On robust overfitting: adversarial training induced distribution matters
Are Graph Neural Networks Optimal Approximation Algorithms?
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention