NeurIPS Poster Decoupled Kullback-Leibler Divergence Loss

Poster

Decoupled Kullback-Leibler Divergence Loss

Jiequan Cui · Zhuotao Tian · Zhisheng Zhong · Xiaojuan Qi · Bei Yu · Hanwang Zhang

East Exhibit Hall A-C #2111

[ Abstract ] [ Project Page ]

[ Paper] [ Slides] [ Poster] [ OpenReview]

Wed 11 Dec 11 a.m. PST — 2 p.m. PST

Abstract: In this paper, we delve deeper into the Kullback–Leibler (KL) Divergence loss and mathematically prove that it is equivalent to the Decoupled Kullback-Leibler (DKL) Divergence loss that consists of 1) a weighted Mean Square Error ($\mathbf{w}$MSE) loss and 2) a Cross-Entropy loss incorporating soft labels. Thanks to the decomposed formulation of DKL loss, we have identified two areas for improvement. Firstly, we address the limitation of KL/DKL in scenarios like knowledge distillation by breaking its asymmetric optimization property. This modification ensures that the $\mathbf{w}$MSE component is always effective during training, providing extra constructive cues.Secondly, we introduce class-wise global information into KL/DKL to mitigate bias from individual samples.With these two enhancements, we derive the Improved Kullback–Leibler (IKL) Divergence loss and evaluate its effectiveness by conducting experiments on CIFAR-10/100 and ImageNet datasets, focusing on adversarial training, and knowledge distillation tasks. The proposed approach achieves new state-of-the-art adversarial robustness on the public leaderboard --- \textit{RobustBench} and competitive performance on knowledge distillation, demonstrating the substantial practical merits. Our code is available at https://github.com/jiequancui/DKL.

Chat is not available.