Workshop
Vision Transformers: Theory and applications
Fahad Shahbaz Khan · Gul Varol · Salman Khan · Ping Luo · Rao Anwer · Ashish Vaswani · Hisham Cholakkal · Niki Parmar · Joost van de Weijer · Mubarak Shah
Virtual
Thu 8 Dec, 11 p.m. PST
Transformer models have demonstrated excellent performance on a diverse set of computer vision applications ranging from classification to segmentation on various data modalities such as images, videos, and 3D data. The goal of this workshop is to bring together computer vision and machine learning researchers working towards advancing the theory, architecture, and algorithmic design for vision transformer models, as well as the practitioners utilizing transformer models for novel applications and use cases.
The workshop’s motivation is to narrow the gap between the research advancements in transformer designs and applications utilizing transformers for various computer vision applications. The workshop also aims to widen the adaptation of transformer models for various vision-related industrial applications. We are interested in papers reporting their experimental results on the utilization of transformers for any application of computer vision, challenges they have faced, and their mitigation strategy on topics like, but not limited to image classification, object detection, segmentation, human-object interaction detection, scene understanding based on 3D, video, and multimodal inputs.
Schedule
Thu 11:00 p.m. - 11:10 p.m.
|
Opening Remarks
(
Opening Remarks
)
>
|
🔗 |
Thu 11:10 p.m. - 11:40 p.m.
|
[First Invited Talk] Ming Hsuan Yang
(
[First Invited Talk] Ming Hsuan Yang
)
>
|
🔗 |
Thu 11:40 p.m. - 11:55 p.m.
|
CLUDA : Contrastive Learning in Unsupervised Domain Adaptation for Semantic Segmentation
(
1st Oral Presentation
)
>
|
Midhun Vayyat · Kasi Jaswin · Anuraag Bhattacharya · Shuaib Ahmed · Rahul Tallamraju 🔗 |
Thu 11:40 p.m. - 1:10 a.m.
|
[1st] Oral Presentation
(
[1st] Oral Presentation
)
>
|
🔗 |
Thu 11:55 p.m. - 12:10 a.m.
|
PatchBlender: A Motion Prior for Video Transformers
(
[1st] Oral Presentation
)
>
|
Gabriele Prato · Yale Song · Janarthanan Rajendran · R Devon Hjelm · Neel Joshi · Sarath Chandar 🔗 |
Fri 12:10 a.m. - 12:25 a.m.
|
Bi-Directional Self-Attention for Vision Transformers
(
[1st] Oral Presentation
)
>
|
George Stoica · Taylor Hearn · Bhavika Devnani · Judy Hoffman 🔗 |
Fri 12:25 a.m. - 12:40 a.m.
|
Video based Object 6D Pose Estimation using Transformers
(
[1st] Oral Presentation
)
>
|
Apoorva Beedu · Huda Alamri · Irfan Essa 🔗 |
Fri 12:40 a.m. - 12:55 a.m.
|
End-to-end Multimodal Representation Learning for Video Dialog
(
[1st] Oral Presentation
)
>
|
Huda Alamri · Apoorva Beedu · Irfan Essa · Anthony Bilic · Michael Hu 🔗 |
Fri 12:55 a.m. - 1:10 p.m.
|
Continual Transformers: Redundancy-Free Attention for Online Inference
(
[1st] Oral Presentation
)
>
|
Lukas Hedegaard · Arian Bakhtiarnia · Alexandros Iosifidis 🔗 |
Fri 1:10 a.m. - 1:40 a.m.
|
Break
(
1st Break
)
>
|
🔗 |
Fri 1:40 a.m. - 2:30 a.m.
|
On the Surprising Effectiveness of Transformers in Low-Labeled Video Recognition
(
[1st] Poster session
)
>
|
Farrukh Rahman · Ömer Mubarek · Zsolt Kira 🔗 |
Fri 1:40 a.m. - 2:30 a.m.
|
Fully-attentive and interpretable: vision and video vision transformers for pain detection
(
[1st] Poster session
)
>
|
Giacomo Fiorentini · Itir Onal Ertugrul · Albert Ali Salah 🔗 |
Fri 1:40 a.m. - 2:30 a.m.
|
DynamicViT: Making Vision Transformer faster through layer skipping
(
[1st] Poster session
)
>
|
Amanuel Mersha · Samuel Assefa 🔗 |
Fri 1:40 a.m. - 2:30 a.m.
|
FQDet: Fast-converging Query-based Detector
(
[1st] Poster session
)
>
|
Cédric Picron · Punarjay Chakravarty · Tinne Tuytelaars 🔗 |
Fri 1:40 a.m. - 2:30 a.m.
|
[1st] Poster session
(
[1st] Poster session
)
>
|
🔗 |
Fri 2:30 a.m. - 3:00 a.m.
|
[2nd Invited Talk] Cordelia Schmid
(
[2nd Invited Talk] Cordelia Schmid
)
>
|
🔗 |
Fri 3:00 a.m. - 3:30 a.m.
|
[3rd Invited Talk] Rita Cucchiara
(
[3rd Invited Talk] Rita Cucchiara
)
>
|
🔗 |
Fri 3:30 a.m. - 3:45 a.m.
|
Matryoshka Representations for Adaptive Deployment
(
[2nd] Oral Presentation
)
>
|
11 presentersAniket Rege · Aditya Kusupati · Gantavya Bhatt · Matthew Wallingford · Aditya Sinha · Vivek Ramanujan · William Howard-Snyder · Kaifeng Chen · Sham Kakade · Prateek Jain · Ali Farhadi |
Fri 3:30 a.m. - 4:00 a.m.
|
[2nd] Oral Presentation
(
[2nd] Oral Presentation
)
>
|
🔗 |
Fri 3:45 a.m. - 4:00 a.m.
|
TPFNet: A Novel Text In-painting Transformer for Text Removal
(
[2nd] Oral Presentation
)
>
|
Onkar Susladkar · Dhruv Makwana · Gayatri Deshmukh · Sparsh Mittal · Sai Chandra Teja R · Rekha Singhal 🔗 |
Fri 4:00 a.m. - 4:30 a.m.
|
[4th Invited Talk] Kristen Grauman
(
[4th Invited Talk] Kristen Grauman
)
>
|
🔗 |
Fri 4:30 a.m. - 5:00 a.m.
|
[5th Invited Talk] Laura Leal-Taixé
(
[5th Invited Talk] Laura Leal-Taixé
)
>
|
🔗 |
Fri 5:00 a.m. - 5:10 a.m.
|
Coffee Break
(
Coffee Break
)
>
|
🔗 |
Fri 5:10 a.m. - 5:50 a.m.
|
PatchRot: A Self-Supervised Technique for Training Vision Transformers
(
[2nd] Poster Session
)
>
|
Sachin Chhabra · Prabal Bijoy Dutta · Hemanth Venkateswara · baoxin Li 🔗 |
Fri 5:10 a.m. - 5:50 a.m.
|
Multimodal Transformer for Parallel Concatenated Variational Autoencoders
(
[2nd] Poster Session
)
>
|
Stephen Liang · Jerry Mendel 🔗 |
Fri 5:10 a.m. - 5:50 a.m.
|
Explicitly Increasing Input Information Density for Vision Transformers on Small Datasets
(
[2nd] Poster Session
)
>
|
Xiangyu Chen · Ying Qin · Wenju Xu · Andrés Bur · Cuncong Zhong · Guanghui Wang 🔗 |
Fri 5:10 a.m. - 5:50 a.m.
|
Learning Explicit Object-Centric Representations with Vision Transformers
(
[2nd] Poster Session
)
>
|
Oscar Vikström · Alexander Ilin 🔗 |
Fri 5:10 a.m. - 5:50 a.m.
|
[2nd] Poster Session
(
[2nd] Poster Session
)
>
|
🔗 |
Fri 5:50 a.m. - 6:00 a.m.
|
Best Paper Announcement and Closing Remarks
(
Best Paper Announcement and Closing Remarks
)
>
|
🔗 |