Fri 6:45 a.m. - 7:00 a.m.
|
Welcome and Opening Remarks
(
Remarks
)
>
SlidesLive Video
|
🔗
|
Fri 7:00 a.m. - 7:30 a.m.
|
Data attribution for LMMs and beyond (James Zou)
(
In-person presentation
)
>
SlidesLive Video
|
🔗
|
Fri 7:30 a.m. - 8:00 a.m.
|
What does scale give us: Why we are building a ladder to the moon (Sara Hooker)
(
In-person presentation
)
>
SlidesLive Video
|
🔗
|
Fri 8:00 a.m. - 8:30 a.m.
|
Coffee Break and Posters
|
🔗
|
Fri 8:30 a.m. - 9:05 a.m.
|
Contributed papers (4 presentations)
(
Contributed Talk
)
>
SlidesLive Video
|
Elan Rosenfeld · Rhys Gould · Nicholas Konz · Theodora Worledge
🔗
|
Fri 9:05 a.m. - 9:50 a.m.
|
The Future of Attribution in ML (Panel)
(
Discussion Panel
)
>
SlidesLive Video
|
🔗
|
Fri 9:50 a.m. - 11:00 a.m.
|
Lunch
|
🔗
|
Fri 11:00 a.m. - 12:00 p.m.
|
Poster Session #1
(
Poster Session
)
>
|
🔗
|
Fri 12:00 p.m. - 12:30 p.m.
|
What Neural Networks Memorize and Why (Vitaly Feldman)
(
In-person presentation
)
>
SlidesLive Video
|
🔗
|
Fri 12:30 p.m. - 1:00 p.m.
|
Evaluation Beyond Task Performance (Milad Nasr)
(
In-person presentation
)
>
SlidesLive Video
|
🔗
|
Fri 1:00 p.m. - 2:00 p.m.
|
Poster Session #2
(
Poster Session
)
>
|
🔗
|
Fri 1:00 p.m. - 1:30 p.m.
|
Coffee Break and Posters
|
🔗
|
Fri 2:00 p.m. - 2:30 p.m.
|
Understanding LLMs via their Generative Successes and Shortcomings (Swabha Swayamdipta)
(
In-person presentation
)
>
SlidesLive Video
|
🔗
|
Fri 2:30 p.m. - 3:00 p.m.
|
Talk by Sanjeev Arora
(
In-person presentation
)
>
SlidesLive Video
|
🔗
|
Fri 3:00 p.m. - 3:30 p.m.
|
Poster Session #3 & Closing Remarks
(
Poster Session
)
>
|
🔗
|
-
|
Irreducible Curriculum for Language Model Pretraining
(
Poster
)
>
link
|
Simin Fan · Martin Jaggi
🔗
|
-
|
Evaluating the Utility of Model Explanations for Model Development
(
Poster
)
>
link
|
Shawn Im · Jacob Andreas · Yilun Zhou
🔗
|
-
|
Why do landscape diagnostics matter? Pinpointing the failure mode of generalization
(
Poster
)
>
link
|
Yefan Zhou · Jianlong Chen · Qinxue Cao · Konstantin Schürholt · Yaoqing Yang
🔗
|
-
|
The Importance of Prompt Tuning for Automated Neuron Explanations
(
Poster
)
>
link
|
Justin Lee · Tuomas Oikarinen · Arjun Chatha · Keng-Chi Chang · Yilan Chen · Lily Weng
🔗
|
-
|
Copy Suppression: Comprehensively Understanding an Attention Head
(
Poster
)
>
link
|
Callum McDougall · Arthur Conmy · Cody Rushing · Tom McGrath · Neel Nanda
🔗
|
-
|
Does It Know?: Probing and Benchmarking Uncertainty in Language Model Latent Beliefs
(
Poster
)
>
link
|
Brian Huang · Joe Kwon
🔗
|
-
|
Attribution Patching Outperforms Automated Circuit Discovery
(
Poster
)
>
link
|
Aaquib Syed · Can Rager · Arthur Conmy
🔗
|
-
|
On the Support Vector Effect in DNNs: Rethinking Last Layer Sensitivity-based Instance Attribution
(
Poster
)
>
link
|
Syed Hasan Amin Mahmood · Rajiv Khanna
🔗
|
-
|
Training Dynamics of Contextual N-Grams in Language Models
(
Poster
)
>
link
|
Lucia Quirke · Lovis Heindrich · Wes Gurnee · Neel Nanda
🔗
|
-
|
SPADE: Sparsity-Guided Debugging for Deep Neural Networks
(
Poster
)
>
link
|
Arshia Soltani Moakhar · Eugenia Iofinova · Dan Alistarh
🔗
|
-
|
In Search of a Data Transformation that Accelerates Neural Field Training
(
Poster
)
>
link
SlidesLive Video
|
Junwon Seo · Sangyoon Lee · Jaeho Lee
🔗
|
-
|
Automatic Discovery of Visual Circuits
(
Poster
)
>
link
|
Achyuta Rajaram · Neil Chowdhury · Antonio Torralba · Jacob Andreas · Sarah Schwettmann
🔗
|
-
|
Mining the Diamond Miner: Mechanistic Interpretability on the Video PreTraining Agent
(
Poster
)
>
link
|
Sonia Joseph · Artem Zholus · Mohammad Reza Samsami · Blake Richards
🔗
|
-
|
Threshold KNN-Shapley: A Linear-Time and Privacy-Friendly Approach to Data Valuation (Workshop Version)
(
Poster
)
>
link
|
Jiachen (Tianhao) Wang · Yuqing Zhu · Yu-Xiang Wang · Ruoxi Jia · Prateek Mittal
🔗
|
-
|
Colour versus Shape Goal Misgeneralization in Reinforcement Learning: A Case Study
(
Poster
)
>
link
SlidesLive Video
|
Karolis Ramanauskas · Özgür Şimşek
🔗
|
-
|
Adversarial Attacks on Neuron Interpretation via Activation Maximization
(
Poster
)
>
link
|
Alex Fulleringer · Geraldin Nanfack · Jonathan Marty · Michael Eickenberg · Eugene Belilovsky
🔗
|
-
|
Divergence at the Interpolation Threshold: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle
(
Poster
)
>
link
SlidesLive Video
|
Rylan Schaeffer · Zachary Robertson · Akhilan Boopathy · Mikail Khona · Ila Fiete · Andrey Gromov · Sanmi Koyejo
🔗
|
-
|
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
(
Poster
)
>
link
|
Lukas Berglund · Meg Tong · Maximilian Kaufmann · Mikita Balesni · Asa Cooper Stickland · Tomasz Korbak · Owain Evans
🔗
|
-
|
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
(
Poster
)
>
link
|
Samuel Marks · Max Tegmark
🔗
|
-
|
Language Models Linearly Represent Sentiment
(
Poster
)
>
link
|
Curt Tigges · Oskar John Hollinsworth · Atticus Geiger · Neel Nanda
🔗
|
-
|
Efficient Data Valuation for Weighted Nearest Neighbor Algorithms
(
Poster
)
>
link
|
Jiachen (Tianhao) Wang · Ruoxi Jia
🔗
|
-
|
How do language models bind entities in context?
(
Poster
)
>
link
|
Jiahai Feng · Jacob Steinhardt
🔗
|
-
|
Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation Patching
(
Poster
)
>
link
|
Aleksandar Makelov · Georg Lange · Atticus Geiger · Neel Nanda
🔗
|
-
|
Object Detection in Deep Neural Networks Differs from Humans in the Periphery
(
Poster
)
>
link
|
Anne Harrington · Vasha DuTell · Mark Hamilton · Ayush Tewari · Simon Stent · Bill Freeman · Ruth Rosenholtz
🔗
|
-
|
Risk Aversion of Online Learning Algorithms
(
Poster
)
>
link
|
Andreas Haupt · Aroon Narayanan
🔗
|
-
|
Tell, Don't Show: Internalized Reasoning influences how LLMs generalize
(
Poster
)
>
link
|
Alexander Meinke · Owain Evans
🔗
|
-
|
Formal Definition of Fingerprints Improves Attribution of Generative Models
(
Poster
)
>
link
|
Hae Jin Song · Mahyar Khayatkhoei · Wael Abd-Almageed
🔗
|
-
|
Attributing Learned Concepts in Neural Networks to Training Data
(
Oral
)
>
link
|
Nicholas Konz · Charles Godfrey · Madelyn Shapiro · Jonathan Tu · Henry Kvinge · Davis Brown
🔗
|
-
|
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale
(
Poster
)
>
link
|
Max Marion · Ahmet Üstün · Luiza A Pozzobon · Alex Wang · Marzieh Fadaee · Sara Hooker
🔗
|
-
|
A Simple and Efficient Baseline for Data Attribution on Images
(
Poster
)
>
link
|
Vasu Singla · Pedro Sandoval-Segura · Micah Goldblum · Jonas Geiping · Tom Goldstein
🔗
|
-
|
Shapley Interactions for Complex Feature Attribution
(
Poster
)
>
link
|
Divyansh Singhvi · Andrej Erkelens · Raghav Jain · Diganta Misra · Naomi Saphra
🔗
|
-
|
Sparse Autoencoders Find Highly Interpretable Features in Language Models
(
Poster
)
>
link
|
Hoagy Cunningham · Aidan Ewart · Logan Smith · Robert Huben · Lee Sharkey
🔗
|
-
|
Successor Heads: Recurring, Interpretable Attention Heads In The Wild
(
Oral
)
>
link
|
Rhys Gould · Euan Ong · George Ogden · Arthur Conmy
🔗
|
-
|
Exploring Dataset-Scale Indicators of Data Quality
(
Poster
)
>
link
|
Benjamin Feuer · Chinmay Hegde
🔗
|
-
|
Self-Select: Optimizing Instruction Selection for Large Language Models
(
Poster
)
>
link
|
Alexander Kyimpopkin · Keshav Ramji
🔗
|
-
|
Speculative Behavior: An Approach to Large Language Model Evaluation and Optimization
(
Poster
)
>
link
SlidesLive Video
|
Hernan C. Vazquez · Jorge Sánchez · Rafael Carrascosa
🔗
|
-
|
Unifying Corroborative and Contributive Attributions in Large Language Models
(
Oral
)
>
link
|
Theodora Worledge · Judy Hanwen Shen · Nicole Meister · Caleb Winston · Carlos Guestrin
🔗
|
-
|
Algorithm Selection with Priority Order for Instances
(
Poster
)
>
link
|
Zhamilya Saparova · Martin Lukac
🔗
|
-
|
Better than Balancing: Debiasing through Data Attribution
(
Poster
)
>
link
|
Saachi Jain · Kimia Hamidieh · Kristian Georgiev · Marzyeh Ghassemi · Aleksander Madry
🔗
|
-
|
Prototype Generation: Robust Feature Visualisation for Data Independent Interpretability
(
Poster
)
>
link
|
Arush Tagade · Jessica Rumbelow
🔗
|
-
|
Backtracking Mathematical Reasoning of Language Models to the Pretraining Data
(
Poster
)
>
link
|
Yasaman Razeghi · Hamish Ivison · Sameer Singh · Yanai Elazar
🔗
|
-
|
Intriguing Properties of Data Attribution on Diffusion Models
(
Poster
)
>
link
|
Xiaosen Zheng · Tianyu Pang · Chao Du · Jing Jiang · Min Lin
🔗
|
-
|
Forbidden Facts: An Investigation of Competing Objectives in Llama 2
(
Poster
)
>
link
SlidesLive Video
|
Tony Wang · Miles Wang · Kaivalya Hariharan · Nir Shavit
🔗
|
-
|
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
(
Poster
)
>
link
|
Fred Zhang · Neel Nanda
🔗
|
-
|
Meta- (out-of-context) learning in neural networks
(
Poster
)
>
link
|
Dmitrii Krasheninnikov · Egor Krasheninnikov · Bruno Mlodozeniec · David Krueger
🔗
|
-
|
Transformer-based Causal Language Models from a Meta-Learning Perspective
(
Poster
)
>
link
|
Xinbo Wu · Lav Varshney
🔗
|
-
|
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization
(
Oral
)
>
link
|
Elan Rosenfeld · Andrej Risteski
🔗
|
-
|
Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism
(
Poster
)
>
link
|
Mansi Sakarvadia · Arham Khan · Aswathy Ajith · Daniel Grzenda · Nathaniel Hudson · André Bauer · Kyle Chard · Ian Foster
🔗
|
-
|
Estimating the Generalization in Deep Neural Networks via Sparsity
(
Poster
)
>
link
|
Yang Zhao · Hao Zhang · Xiuyuan Hu
🔗
|
-
|
Data Attribution for Segmentation Models
(
Poster
)
>
link
|
Albert Tam · Joshua Vendrow · Aleksander Madry
🔗
|
-
|
Summing Up the Facts: Additive Mechanisms behind Factual Recall in LLMs
(
Poster
)
>
link
|
Bilal Chughtai · Alan Cooney · Neel Nanda
🔗
|