Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Machine Learning and Compression

M2M-TAG: Training-Free Many-to-Many Token Aggregation for Vision Transformer Acceleration

Fanhu Zeng · Deli Yu


Abstract:

Vision transformers have been widely explored due to its unprecedented performance in various downstream tasks. However, its heavy computational cost restricts its real-world deployment and much interest has aroused for compressing tokens of vision transformer dynamically. Current methods mainly pay attention to token pruning or merging to reduce token numbers, which inevitably leads to numerous information loss. In this paper, we regard token reduction process as matrix transformation of tokens, and propose a many-to-many token aggregation framework called M2M-TAG, which can serve as a generalization form of all existing methods. The parameter-free many-to-many transformation can be constructed by combining importance and similarity metric of full tokens in global scope. The aggregated tokens can reserve token information to the most and enable training-free acceleration. We employ it as a plug-and-play module to accelerate vision transformers and conduct various experiments to demonstrate the effectiveness of proposed framework. Specifically, we reduce 34.8% FLOPs with only 0.1% accuracy drop on DeiT-S without fine-tuning, even outperforming some existing fine-tuning methods. We further comprehensive results show that the approach achieves competitive performance with better computation-performance trade-off, impressive budget reduction and maximum inference acceleration.

Chat is not available.