Poster
in
Workshop: Workshop on Machine Learning and Compression
Differentiable Attention
Yancheng Wang · Dongfang Sun · Yingzhen Yang
Self-attention has been widely used in deep learning, and recent efforts have been devoted to incorporating self-attention modules into convolutional neural networks for computer vision. Previous approaches usually use fixed channels to compute feature affinity for self-attention, which limits the capability of selecting the most informative channels for computing such feature affinity and affects the performance of downstream tasks. In this paper, we propose a novel attention module termed Differentiable Attention (DA). In contrast with conventional self-attention, DA searches for the locations and key dimension of channels in a continuous space by a novel differentiable searching method. Our DA module is compatible with either fixed neural network backbone or learnable backbone with Differentiable Neural Architecture Search (DNAS), leading to DA with Fixed Backbone (DA-FB) and DA-DNAS respectively. We apply DA-FB and DA-DNAS to two computer vision tasks, person Re-IDentification methods (Re-ID) and image classification, with state-of-the-art results on standard benchmarks and compact architecture compared to competing methods, revealing the advantage of DA.