Sun 8:45 a.m. - 9:00 a.m.
|
Welcome and Opening Remarks
(
Intro
)
>
SlidesLive Video
|
🔗
|
Sun 9:00 a.m. - 9:45 a.m.
|
Atticus Geiger: The Current State of Interpretability and Ideas for Scaling Up
(
Invited Talk
)
>
SlidesLive Video
|
Atticus Geiger
🔗
|
Sun 9:45 a.m. - 10:15 a.m.
|
Spotlight Talks
(
Spotlight Talks
)
>
|
🔗
|
Sun 9:45 a.m. - 9:51 a.m.
|
LoFiT: Localized Fine-tuning on LLM Representations
(
Spotlight Talk
)
>
SlidesLive Video
|
Fangcong Yin · Xi Ye · Greg Durrett
🔗
|
Sun 9:51 a.m. - 9:57 a.m.
|
Decomposing and Editing Predictions by Modeling Model Computation
(
Spotlight Talk
)
>
|
Harshay Shah · Andrew Ilyas · Aleksander Madry
🔗
|
Sun 9:57 a.m. - 10:03 a.m.
|
Analyzing (In)Abilities of SAEs via Formal Languages
(
Spotlight Talk
)
>
SlidesLive Video
|
Abhinav Menon · Manish Shrivastava · David Krueger · Ekdeep S Lubana
🔗
|
Sun 10:03 a.m. - 10:09 a.m.
|
Towards Reliable Evaluation of Behavior Steering Interventions in LLMs
(
Spotlight Talk
)
>
SlidesLive Video
|
Itamar Pres · Laura Ruis · Ekdeep S Lubana · David Krueger
🔗
|
Sun 10:09 a.m. - 10:15 a.m.
|
Probing the Decision Boundaries of In-context Learning in Large Language Models
(
Spotlight Talk
)
>
SlidesLive Video
|
Siyan Zhao
🔗
|
Sun 10:15 a.m. - 10:45 a.m.
|
Coffee Break
|
🔗
|
Sun 10:45 a.m. - 11:30 a.m.
|
Fernanda Viégas: AI Dashboard Design: A User-Centered Approach to Interpretability
(
Invited Talk
)
>
SlidesLive Video
|
Fernanda Viégas
🔗
|
Sun 11:30 a.m. - 12:00 p.m.
|
Junior Panel Discussion
(
Panel Discussion
)
>
SlidesLive Video
|
🔗
|
Sun 12:00 p.m. - 1:00 p.m.
|
Lunch Break
|
🔗
|
Sun 1:00 p.m. - 2:00 p.m.
|
Poster Session
(
Poster Session
)
>
|
🔗
|
Sun 2:00 p.m. - 2:45 p.m.
|
David Ha: The Future of Collective Intelligence and Meta Evolution for Foundation Models
(
Invited Talk
)
>
SlidesLive Video
|
David Ha
🔗
|
Sun 2:45 p.m. - 3:15 p.m.
|
Coffe Break
|
🔗
|
Sun 3:15 p.m. - 4:00 p.m.
|
Jacob Steinhardt: Scalably Understanding AI with AI
(
Invited Talk
)
>
SlidesLive Video
|
Jacob Steinhardt
🔗
|
Sun 4:00 p.m. - 4:55 p.m.
|
Panel Discussion
(
Panel Discussion
)
>
SlidesLive Video
|
Fernanda Viégas · Neel Nanda · Atticus Geiger · Jacob Steinhardt
🔗
|
Sun 4:55 p.m. - 5:00 p.m.
|
Closing Remarks and Award Ceremony
(
Outro
)
>
SlidesLive Video
|
🔗
|
-
|
Overcoming Limitations of Steering Vectors with Low-Rank Representation Steering
(
Poster
)
>
link
|
Dmitrii Krasheninnikov · David Krueger
🔗
|
-
|
Do LLMs internally ``know'' when they follow instructions?
(
Poster
)
>
link
|
Juyeon Heo · Christina Heinze-Deml · Shirley Ren · Oussama Elachqar · Udhyakumar Nallasamy · Andy Miller · Jaya Narain
🔗
|
-
|
LoFiT: Localized Fine-tuning on LLM Representations
(
Poster
)
>
link
|
Fangcong Yin · Xi Ye · Greg Durrett
🔗
|
-
|
Ablation is Not Enough to Emulate DPO: A Mechanistic Analysis of Toxicity Reduction
(
Poster
)
>
link
|
Yushi Yang · Filip Sondej · Harry Mayne · Adam Mahdi
🔗
|
-
|
Is Free Self-Alignment Possible?
(
Poster
)
>
link
|
Dyah Adila · Changho Shin · Yijing Zhang · Frederic Sala
🔗
|
-
|
Steering semantic search with interpretable features from sparse autoencoders
(
Poster
)
>
link
|
Christine Ye · Charles O'Neill · John Wu · Kartheik Iyer
🔗
|
-
|
Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering
(
Poster
)
>
link
|
Ido Sobol · Chenfeng Xu · Or Litany
🔗
|
-
|
Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering
(
Poster
)
>
link
|
Joris Postmus · Steven Abreu
🔗
|
-
|
Measuring the Reliability of Causal Probing Methods: Tradeoffs, Limitations, and the Plight of Nullifying Interventions
(
Poster
)
>
link
|
Marc Canby · Adam Davies · Chirag Rastogi · Julia C Hockenmaier
🔗
|
-
|
Uncovering Uncertainty in Transformer Inference
(
Poster
)
>
link
|
Greyson Brothers · Willa Mannering · John Winder · Amber Tien
🔗
|
-
|
Algorithmic Oversight for Deceptive Reasoning
(
Poster
)
>
link
|
Ege Onur Taga · Mingchen Li · Yongqi Chen · Samet Oymak
🔗
|
-
|
Probing the Decision Boundaries of In-context Learning in Large Language Models
(
Poster
)
>
link
|
Siyan Zhao · Tung Nguyen · Aditya Grover
🔗
|
-
|
Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks
(
Poster
)
>
link
|
Madeline Brumley · Joe Kwon · David Krueger · Dmitrii Krasheninnikov · Usman Anwar
🔗
|
-
|
Linearly Controlled Language Generation with Performative Guarantees
(
Poster
)
>
link
|
Emily Cheng · Marco Baroni · Carmen Amo Alonso
🔗
|
-
|
Entropy-Based Decoding for Retrieval-Augmented Large Language Models
(
Poster
)
>
link
|
Zexuan Qiu · Zijing Ou · Bin Wu · Jingjing Li · Aiwei Liu · Irwin King
🔗
|
-
|
Toward Explanation Bottleneck Models
(
Poster
)
>
link
|
Shin'ya Yamaguchi · Kosuke Nishida
🔗
|
-
|
Can sparse autoencoders be used to decompose and interpret steering vectors?
(
Poster
)
>
link
|
Harry Mayne · Yushi Yang · Adam Mahdi
🔗
|
-
|
WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models
(
Poster
)
>
link
|
Peng Wang · Zexi Li · Ningyu Zhang · Ziwen Xu · Yunzhi Yao · Yong Jiang · Pengjun Xie · Fei Huang · Huajun Chen
🔗
|
-
|
Representation Tuning
(
Poster
)
>
link
|
Christopher Ackerman
🔗
|
-
|
SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models
(
Poster
)
>
link
|
Carter Teplica · Yixin Liu · Arman Cohan · Tim G. J. Rudner
🔗
|
-
|
Understanding Visual Concepts Across Models
(
Poster
)
>
link
|
Brandon Trabucco · Max Gurinas · Kyle Doherty · Ruslan Salakhutdinov
🔗
|
-
|
Secret Seeds in Text-to-Image Diffusion Models
(
Poster
)
>
link
|
Katherine Xu · Lingzhi Zhang · Jianbo Shi
🔗
|
-
|
Analyzing (In)Abilities of SAEs via Formal Languages
(
Poster
)
>
link
|
Abhinav Menon · Manish Shrivastava · Ekdeep S Lubana · David Krueger
🔗
|
-
|
Pay Attention to What Matters
(
Poster
)
>
link
|
Pedro Silva · Fadhel Ayed · Antonio De Domenico · Ali Maatouk
🔗
|
-
|
Decomposing and Editing Predictions by Modeling Model Computation
(
Poster
)
>
link
|
Harshay Shah · Andrew Ilyas · Aleksander Madry
🔗
|
-
|
Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models
(
Poster
)
>
link
|
Xinyu Zhou · Delong Chen · Samuel Cahyawijaya · Xufeng Duan · Zhenguang Cai
🔗
|
-
|
Semantic Entropy Neurons: Encoding Semantic Uncertainty in the Latent Space of LLMs
(
Poster
)
>
link
|
Jiatong Han · Jannik Kossen · Muhammed Razzak · Yarin Gal
🔗
|
-
|
Towards Reliable Evaluation of Behavior Steering Interventions in LLMs
(
Poster
)
>
link
|
Itamar Pres · Laura Ruis · Ekdeep S Lubana · David Krueger
🔗
|
-
|
Unveiling and Manipulating Concepts in Time Series Foundation Models
(
Poster
)
>
link
|
Michal Wilinski · Mononito Goswami · Nina Żukowska · Willa Potosnak · Artur Dubrawski
🔗
|
-
|
GPT-2 Small Fine-Tuned on Logical Reasoning Summarizes Information on Punctuation Tokens
(
Poster
)
>
link
|
Sonakshi Chauhan · Atticus Geiger
🔗
|
-
|
Extracting Paragraphs from LLM Token Activations
(
Poster
)
>
link
|
Nicky Pochinkov · Angelo Benoit · Lovkush Agarwal · Zainab Ali Majid · Lucile Ter-Minassian
🔗
|
-
|
Analysing the Residual Stream of Language Models Under Knowledge Conflicts
(
Poster
)
>
link
|
Yu Zhao · Xiaotang Du · Giwon Hong · Aryo Gema · Alessio Devoto · Hongru WANG · Xuanli He · Kam-Fai Wong · Pasquale Minervini
🔗
|
-
|
Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning tasks
(
Poster
)
>
link
|
Gregory Kang Ruey Lau · Wenyang Hu · Liu Diwen · Chen Jizhuo · See-Kiong Ng · Bryan Kian Hsiang Low
🔗
|