Poster
in
Affinity Workshop: Black in AI
DynamicViT: Faster Vision Transformer
Amanuel Mersha · Samuel Assefa
Keywords: [ Computer Vision ]
The recent deep learning breakthroughs in language and vision tasks can be mainly attributed to large-scale transformers. Unfortunately, their massive size and high compute requirements have limited their use in resource-constrained environments. Dynamic neural networks promise reduced amount of compute requirement by dynamically adjusting the computational path based on the input. We propose a layer skipping dynamic vision transformer (DynamicViT) that skips layers for each sample based on decisions given by a reinforcement learning agent. Extensive experiment on CIFAR-10 and CIFAR-100 showed that this dynamic ViT gained an average of 40\% speed increase evaluated on different batch sizes ranging from 1 to 1024.