Workshop
5th Workshop on Self-Supervised Learning: Theory and Practice
XuDong Wang · Ishan Misra · Mathilde Caron · Tengda Han · Pengtao Xie
West Meeting Room 202-204
Sat 14 Dec, 8:30 a.m. PST
At NeurIPS from 2020 to 2024, we successfully organized the 1st, 2nd, 3rd and 4t workshops on Self-Supervised Learning – Theory and Practice. These events attracted a diverse audience from multiple domains, including vision, speech, NLP, robotics, ML theory, and industry practitioners. Building on the success of these previous workshops, we are excited to continue organizing the workshop on self-supervised learning this year. Self-supervised learning (SSL) is an approach of representation learning that does not rely on human-labeled data. Instead, it creates auxiliary tasks from unlabeled input data and learns representations by solving these tasks. SSL has shown significant success across various domains such as images (e.g., MAE, DINO, MoCo, PIRL, SimCLR), speech (e.g., wav2vec, Whisper), and text (e.g., BERT, GPT, Llama). It has also demonstrated promising results in other data modalities including graphs, time-series, and audio. Recent large language models—predominantly trained on web-scale data using self-supervised methods—have exhibited remarkable generalizability and are beginning to transform numerous research fields. SSL, without using human-provided labels, can achieve performance comparable to or even surpassing that of fully supervised methods. Furthermore, generative SSL techniques such as Imagen, Stable Diffusion, and SORA have significantly enhanced the artistic capabilities of AI models. Existing research on self-supervised learning (SSL) has primarily concentrated on enhancing empirical performance without substantial theoretical underpinnings. Although SSL approaches are empirically effective across various benchmarks, their theoretical foundations and practical applications remain less explored. Key questions such as the reasons behind the superior performance of certain auxiliary tasks, the requisite amount of unlabeled data for learning effective representations, the impact of neural architectures on SSL performance, and the practical scenarios where SSL outperforms supervised models, are still largely unanswered. Our workshop aims to address these gaps by fostering a dialogue between theory and practice, especially in the context of LLMs. We plan to gather researchers interested in SSL from diverse fields to explore the theoretical bases of empirically successful SSL methods and to discuss how these theoretical insights could further enhance SSL’s practical performance. This workshop will differentiate itself from previous SSL-related workshops by prioritizing the establishment of theoretical foundations and providing theoretical frameworks to guide the development of new SSL methods. Additionally, we will attempt to close the loop from practice to theory, by inviting practitioners to share their experiences and insights regarding the practical advantages and challenges of using SSL
Schedule
Sat 8:30 a.m. - 8:55 a.m.
|
Poster Setup
(
Poster
)
>
|
🔗 |
Sat 9:00 a.m. - 9:15 a.m.
|
Opening Remarks
(
Intro
)
>
SlidesLive Video |
XuDong Wang 🔗 |
Sat 9:15 a.m. - 9:45 a.m.
|
Sherry Yang (Google DeepMind): Self-Supervised World Modeling from Internet Data
(
Invited Talk
)
>
link
SlidesLive Video |
Sherry Yang 🔗 |
Sat 9:45 a.m. - 10:15 a.m.
|
Pauline Luc (Google DeepMind): Self-supervision for General Video Understanding Beyond Semantics
(
Invited Talk
)
>
link
SlidesLive Video |
Pauline Luc 🔗 |
Sat 10:15 a.m. - 10:30 a.m.
|
Short Break
(
Short Break
)
>
|
🔗 |
Sat 10:30 a.m. - 11:00 a.m.
|
Hanna Hajishirzi (University of Washington & AI2): OLMo & Molmo: Open Textual and Visual Language Models
(
Invited Talk
)
>
link
SlidesLive Video |
Hanna Hajishirzi 🔗 |
Sat 11:00 a.m. - 11:30 a.m.
|
Hilde Kuehne (Univ. of Tuebingen & MIT-IBM Watson AI Lab): Advances in Self-supervised Multimodal Learning
(
Invited Talk
)
>
link
SlidesLive Video |
Hilde Kuehne 🔗 |
Sat 11:30 a.m. - 11:45 a.m.
|
Oral: In-Context Symmetries: Self-Supervised Learning through Contextual World Models
(
Oral
)
>
link
SlidesLive Video |
Sharut Gupta · Chenyu Wang · Yifei Wang · Tommi Jaakkola · Stefanie Jegelka 🔗 |
Sat 11:45 a.m. - 12:00 p.m.
|
Oral: A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning
(
Oral
)
>
link
SlidesLive Video |
Khimya Khetarpal · Daniel (Zhaohan) Guo · Bernardo Avila Pires · Yunhao Tang · Clare Lyle · Mark Rowland · Nicolas Heess · Diana Borsa · Arthur Guez · Will Dabney 🔗 |
Sat 12:30 p.m. - 1:50 p.m.
|
Poster Session
(
Poster
)
>
|
🔗 |
Sat 2:00 p.m. - 2:30 p.m.
|
Trevor Darrell (UC Berkeley): From Unsupervised Segmentation to Visual Prompting
(
Invited Talk
)
>
link
SlidesLive Video |
Trevor Darrell 🔗 |
Sat 2:30 p.m. - 3:00 p.m.
|
Alan Yuille (Johns Hopkins University): Supervision of 3D-aware Models by Synthetic Data
(
Invited Talk
)
>
link
SlidesLive Video |
Alan Yuille 🔗 |
Sat 3:00 p.m. - 3:30 p.m.
|
Phillip Isola (MIT): Representation Learning from Human Feedback
(
Invited Talk
)
>
link
SlidesLive Video |
Phillip Isola 🔗 |
Sat 3:30 p.m. - 4:00 p.m.
|
Lili Yu (FAIR, Meta): Paths Towards Deep Fused Multimodal Modeling
(
Invited Talk
)
>
link
SlidesLive Video |
🔗 |
Sat 4:00 p.m. - 4:30 p.m.
|
Ziwei Liu (Nanyang Technological University): From High-fidelity 3D Generative Models to Dynamic Embodied Learning
(
Invited Talk
)
>
link
SlidesLive Video |
Ziwei Liu 🔗 |
Sat 4:30 p.m. - 4:45 p.m.
|
Occam's Razor for Self Supervised Learning: What is Sufficient to Learn Good Representations?
(
Oral
)
>
link
SlidesLive Video |
Mark Ibrahim · David Klindt · Randall Balestriero 🔗 |
Sat 4:45 p.m. - 5:00 p.m.
|
Neural Embedding Ranks: Aligning 3D latent dynamics with movement for long-term decoding
(
Oral
)
>
link
SlidesLive Video |
Chenggang Chen · Zhiyu Yang 🔗 |
-
|
Self-Supervised Pretext Tasks for Event Sequence Data from Detecting Misalignment ( Poster ) > link | Yimu Wang · He Zhao · Ruizhi Deng · Fred Tung · Greg Mori 🔗 |
-
|
Masked Self-Supervised Pretraining for Semantic Segmentation of Dental Radiographs ( Poster ) > link | Tejeswar Pokuri · Laalenthika Konthalapalli · Sarvesh Kumar · Karthik S. 🔗 |
-
|
For Perception Tasks: The Cost of LLM Pretraining by Next-Token Prediction Outweigh its Benefits ( Poster ) > link | Randall Balestriero · Hai Huang 🔗 |
-
|
Variational Graph Contrastive Learning ( Poster ) > link | shifeng xie · Jhony H. Giraldo 🔗 |
-
|
PabLO: Improving Semi-Supervised Learning with Pseudolabeling Optimization ( Poster ) > link | Harit Vishwakarma · Yi Chen · Satya Sai Srinath Namburi · Sui Jiet Tay · Ramya Korlakai Vinayak · Frederic Sala 🔗 |
-
|
Two Is Better Than One: Aligned Clusters Improve Anomaly Detection ( Poster ) > link | Alain Ryser · Thomas Sutter · Alexander Marx · Julia Vogt 🔗 |
-
|
A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning ( Poster ) > link | Khimya Khetarpal · Daniel (Zhaohan) Guo · Bernardo Avila Pires · Yunhao Tang · Clare Lyle · Mark Rowland · Nicolas Heess · Diana Borsa · Arthur Guez · Will Dabney 🔗 |
-
|
Squeezing Water from a Stone: Improving Pre-Trained Self-Supervised Embeddings Through Effective Entropy Maximization ( Poster ) > link | Deep Chakraborty · Tim G. J. Rudner · Erik Learned-Miller 🔗 |
-
|
Boosting Unsupervised Segmentation Learning ( Poster ) > link | Alp Eren SARI · Francesco Locatello · Paolo Favaro 🔗 |
-
|
Occam's Razor for Self Supervised Learning: What is Sufficient to Learn Good Representations? ( Poster ) > link | Mark Ibrahim · David Klindt · Randall Balestriero 🔗 |
-
|
Benchmarking Self-Supervised Learning for Single-Cell Data ( Poster ) > link | Philip Toma · Olga Ovcharenko · Imant Daunhawer · Julia Vogt · Florian Barkmann · Valentina Boeva 🔗 |
-
|
Equivariant Representation Learning for Augmentation-based Self-Supervised Learning via Image Reconstruction ( Poster ) > link | Qin Wang · Kai Krajsek · Hanno Scharr 🔗 |
-
|
Squeezing performance from pathology foundation models with chained hyperparameter searches ( Poster ) > link | Joseph Cappadona · Ken Zeng · Carlos Fernandez-Granda · Jan Witowski · Yann LeCun · Krzysztof Geras 🔗 |
-
|
Leveraging Audio and Visual Recurrence for Unsupervised Video Highlight Detection ( Poster ) > link | Md Zahidul Islam · Sujoy Paul · Mrigank Rochan 🔗 |
-
|
Test-Time Adaptation for Video Highlight Detection ( Poster ) > link | Md Zahidul Islam · Sujoy Paul · Mrigank Rochan 🔗 |
-
|
Decoupling Vertical Federated Learning using Local Self-Supervision ( Poster ) > link | Avi Amalanshu · Yash Sirvi · David Inouye 🔗 |
-
|
Self-supervised Video Instance Segmentation Can Boost Geographic Entity Alignment in Historical Maps ( Poster ) > link | Xue Xia · Randall Balestriero · Tao Zhang · Lorenz Hurni 🔗 |
-
|
Neural Embeddings Rank: Aligning 3D latent dynamics with movements ( Poster ) > link | Chenggang Chen · Zhiyu Yang 🔗 |
-
|
An Empirical Analysis of Speech Self-Supervised Learning at Multiple Resolutions ( Poster ) > link | Theo Clark · Benedetta Cevoli · Eloy de Jong · Timofey Abramski · Jamie Dougherty 🔗 |
-
|
Self-Supervised Bisimulation Action Chunk Representation for Efficient RL ( Poster ) > link | Lei Shi · Jianye Hao · Hongyao Tang · Zibin Dong · YAN ZHENG 🔗 |
-
|
In-Context Symmetries: Self-Supervised Learning through Contextual World Models ( Poster ) > link | Sharut Gupta · Chenyu Wang · Yifei Wang · Tommi Jaakkola · Stefanie Jegelka 🔗 |
-
|
On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning ( Poster ) > link | Bokun Wang · Yunwen Lei · Yiming Ying · Tianbao Yang 🔗 |
-
|
Anomaly Detection In The Wild: Can SSL Handle Strong Distribution Imbalances? ( Poster ) > link | Daniel Otero · Rafael Mateus · Randall Balestriero 🔗 |
-
|
EmbedSimScore: Advancing Protein Similarity Analysis with Structural and Contextual Embeddings ( Poster ) > link | Gourab Saha · Toki Tahmid · Md. Shamsuzzoha Bayzid 🔗 |
-
|
Improving OOD Generalization of Pre-trained Encoders via Aligned Embedding-Space Ensembles ( Poster ) > link | Shuman Peng · Arash Khoeini · Sharan Vaswani · Martin Ester 🔗 |
-
|
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations ( Poster ) > link | Benedikt Alkin · Lukas Miklautz · Sepp Hochreiter · Johannes Brandstetter 🔗 |
-
|
Adaptive Neighborhoods in Contrastive Regression Learning for Brain Age Prediction ( Poster ) > link | Jakob Träuble · Lucy Hiscox · Curtis Johnson · Carola-Bibiane Schönlieb · Gabriele Schierle · Angelica Aviles-Rivero 🔗 |
-
|
LLM2CLIP: Extending the Capability Boundaries of CLIP through Large Language Models ( Poster ) > link |
11 presentersAoqi Wu · weiquan Huang · Yifan Yang · Xufang Luo · Yuqing Yang · Chunyu Wang · Liang Hu · Xiyang Dai · Dongdong Chen · Chong Luo · Lili Qiu |
-
|
Uncovering RL Integration in SSL Loss: Objective-Specific Implications for Data-Efficient RL ( Poster ) > link | Ömer Çağatan · Baris Akgun 🔗 |
-
|
Explainable Audio-Visual Representation Learning via Prototypical Contrastive Masked Autoencoder ( Poster ) > link | Yi Li · Plamen P Angelov 🔗 |
-
|
DIETing: Self-Supervised Learning with Instance Discrimination Learns Identifiable Features ( Poster ) > link | Attila Juhos · Alice Bizeul · Patrik Reizinger · Randall Balestriero · David Klindt · Mark Ibrahim · Julia Vogt · Wieland Brendel 🔗 |
-
|
Intra-video Positive Pairs in Self-Supervised Learning for Ultrasound ( Poster ) > link | Blake VanBerlo · Alexander Wong · Jesse Hoey · Robert Arntfield 🔗 |
-
|
Robust Self-Supervised Learning for Adversarial Attack Detection ( Poster ) > link | Yi Li · Plamen P Angelov · Neeraj Suri 🔗 |
-
|
Data Augmentation Transformations for Self-Supervised Learning with Ultrasound ( Poster ) > link | Blake VanBerlo · Alexander Wong · Jesse Hoey · Robert Arntfield 🔗 |
-
|
Unfolding Videos Dynamics via Taylor Expansion ( Poster ) > link | Siyi Chen · Minkyu Choi · Zesen Zhao · Kuan Han · Qing Qu · Zhongming Liu 🔗 |
-
|
A Graph Matching Approach to Balanced Data Sub-Sampling for Self-Supervised Learning ( Poster ) > link | Hugues Van Assel · Randall Balestriero 🔗 |
-
|
Time-dependent Sampling for Contrastive Self-supervised Learning of Longitudinal Biosignals Representations ( Poster ) > link | Sam Perochon · Salar Abbaspourazad · Joseph Futoma · Andy Miller · Guillermo Sapiro 🔗 |
-
|
Maven: A Multimodal Foundation Model for Supernova Science ( Poster ) > link | Gemma Zhang · Thomas Helfer · Alex Gagliano · Siddharth Mishra-Sharma · V Villar 🔗 |
-
|
Hoop-MSSL: Multi-Task Self-supervised Representation Learning on Basketball Spatio-Temporal Data ( Poster ) > link | xing wang · Jianchong Shao · Chunyang Huang · Zitian Tang · Miguel-Ángel GÓMEZ · Zhang shaoliang · Konstantinos Pelechrinis 🔗 |
-
|
The Birth of Self Supervised Learning: A Supervised Theory ( Poster ) > link | Randall Balestriero · Yann LeCun 🔗 |
-
|
Unsupervised Event Outlier Detection in Continuous Time ( Poster ) > link | Somjit Nath · Kry Yik Chau Lui · Siqi Liu 🔗 |
-
|
Context-Aware Predictive Coding: A Representation Learning Framework for WiFi Sensing ( Poster ) > link | Borna Barahimi · Hina Tabassum · Mohammad Omer · Omer Waqar 🔗 |
-
|
Seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models ( Poster ) > link | Hafez Ghaemi · Eilif B. Muller · Shahab Bakhtiari 🔗 |
-
|
Representing Positional Information in Generative World Models for Object Manipulation ( Poster ) > link | Stefano Ferraro · Pietro Mazzaglia · Tim Verbelen · Bart Dhoedt · Sai Rajeswar Mudumba 🔗 |
-
|
Influence Estimation in Self-Supervised Learning ( Poster ) > link | Nidhin Harilal · Reza Akbarian Bafghi · Amit Rege · Maziar Raissi · Claire Monteleoni 🔗 |
-
|
Enhancing JEPAs with Spatial Conditioning: Robust and Efficient Representation Learning ( Poster ) > link | Etai Littwin · Vimal Thilak · Anand Gopalakrishnan 🔗 |
-
|
TSA on AutoPilot: Self-tuning Self-supervised Time Series Anomaly Detection ( Poster ) > link | Boje Deforce · Meng-Chieh Lee · Bart Baesens · Estefanía Asensio · Jaemin Yoo · Leman Akoglu 🔗 |
-
|
On the Collapse Errors Induced by the Deterministic Sampler for Diffusion Models ( Poster ) > link | Zhang · Difan Zou 🔗 |
-
|
Self-Supervised Learning of Disentangled Representations for Multivariate Time-Series ( Poster ) > link | Ching Chang · Chan Chiao-Tung · Wei-Yao Wang · Wen-Chih Peng · Tien-Fu Chen 🔗 |
-
|
Pearls from Pebbles: Improved Confidence Functions for Auto-labeling ( Poster ) > link | Harit Vishwakarma · Yi Chen · Sui Jiet Tay · Satya Sai Srinath Namburi · Frederic Sala · Ramya Korlakai Vinayak 🔗 |
-
|
Informed Augmentation Selection Improves Tabular Contrastive Learning ( Poster ) > link | Arash Khoeini · Shuman Peng · Martin Ester 🔗 |
-
|
Self Supervised Learning Using Controlled Diffusion Image Augmentation ( Poster ) > link | Judah Goldfeder · Patrick Puma · Gabriel Guo · Gabriel Trigo · Hod Lipson 🔗 |
-
|
Uncovering the Risk of Model Collapsing in Self-Supervised Continual Test-time Adaptation ( Poster ) > link | Trung Hieu Hoang · MinhDuc Vo · Minh Do 🔗 |
-
|
PiLaMIM: Toward Richer Visual Representations by Integrating Pixel and Latent Masked Image Modeling ( Poster ) > link | Junmyeong Lee · Euijun Hwang · Sukmin Cho · Jong Park 🔗 |
-
|
DRESS: Disentangled Representation-based Self-Supervised Meta-Learning for Diverse Tasks ( Poster ) > link | Wei Cui · Yi Sui · Jesse Cresswell · Keyvan Golestan 🔗 |
-
|
When Do We Not Need Larger Vision Models? ( Poster ) > link | Baifeng Shi · Ziyang Wu · Maolin Mao · Xin Wang · Trevor Darrell 🔗 |
-
|
SigCLR: Sigmoid Contrastive Learning of Visual Representations ( Poster ) > link | Ömer Çağatan 🔗 |
-
|
NARAIM: Native Aspect Ratio Autoregressive Image Models ( Poster ) > link | Daniel Gallo Fernández · Robert van der Klis · Răzvan-Andrei Matișan · Janusz Partyka · Samuele Papa · Efstratios Gavves · Phillip Lippe 🔗 |
-
|
$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs ( Poster ) > link | Uladzislau Sobal · Mark Ibrahim · Randall Balestriero · Vivien Cabannes · Diane Bouchacourt · Pietro Astolfi · Kyunghyun Cho · Yann LeCun 🔗 |