Poster
DEPrune: Depth-wise Separable Convolution Pruning for Maximizing GPU Parallelism
Cheonjun Park · Mincheol Park · Hyunchan Moon · Myung Kuk Yoon · Seokjin Go · Suhyun Kim · Won Woo Ro
[
Abstract
]
Thu 12 Dec 4:30 p.m. PST
— 7:30 p.m. PST
Abstract:
Depth-wise Separable Convolution (DSConv) has a powerful representation even with fewer parameters and computation, leading to its adoption by almost all of the state-of-the-art CNN models. DSConv models are already compact making it hard to apply pruning, and there are no previous pruning techniques that even target depth-wise convolution (DW-conv).In this paper, we present the novel Depth-wise Separable Convolution Pruning (DEPrune), the first pruning implementation on not only point-wise convolution but also DW-Conv, which is optimized by considering and analyzing the computation of DSConv on GPU, which is the most widely used AI accelerator.DEPrune employs a fine-grained pruning approach, yet it achieves the structured sparsity typically absent in fine-grained pruning, enabling practical hardware acceleration. Moreover, this method maintains a high pruning ratio without causing any accuracy drop.We additionally represent techniques that further enhance DEPrune performance: 1) balanced workload tuning (BWT), and 2) hardware-aware sparsity recalibration (HSR).Experiment results show that DSPrune achieves up to $3.74\times$ practical speedup in DSConv inference on GPUs while maintaining the accuracy of EfficientNet-B0 on ImageNet.
Live content is unavailable. Log in and register to view live content