Skip to yearly menu bar Skip to main content


Poster Session
in
Workshop: Scientific Methods for Understanding Neural Networks

Knowledge Distillation: The Functional Perspective

Gabryel Mason-Williams · Israel Mason-Williams · Mark Sandler

[ ] [ Project Page ]
Sun 15 Dec 4:30 p.m. PST — 5:30 p.m. PST

Abstract:

Empirical findings of accuracy correlations between students and teachers in the knowledge distillation framework have served as supporting evidence for knowledge transfer. In this paper, we sought to explain and understand the knowledge transfer derived from knowledge distillation via functional similarity, hypothesising that knowledge distillation provides a functionally similar student to its teacher model. While we accept this hypothesis for two out of three architectures across a range of metrics for functional analysis against four controls, the results show that knowledge transfer represents a less significant role than expected. Furthermore, results from the use of Uniform and Gausian Noise as teachers suggest that the knowledge-sharing aspects of knowledge distillation inadequately describe the accuracy benefits witnessed when using the knowledge distillation training setup itself. Moreover, in the first instance, we show that knowledge distillation is not a compression mechanism but primarily a data-dependent training regulariser with a small capacity to transfer knowledge in the best case.

Chat is not available.