Spotlight
in
Workshop: Information-Theoretic Principles in Cognitive Systems
On the informativeness of supervision signals
Ilia Sucholutsky · Raja Marjieh · Tom Griffiths
Learning transferable representations by training a classifier is a well-established technique in deep learning (e.g. ImageNet pretraining), but there is a lack of theory to explain why this kind of task-specific pre-training should result in 'good' representations. We conduct an information-theoretic analysis of several commonly-used supervision signals to determine how they contribute to representation learning performance and how the dynamics are affected by training parameters like the number of labels, classes, and dimensions in the training dataset. We confirm these results empirically in a series of simulations and conduct a cost-benefit analysis to establish a tradeoff curve allowing users to optimize the cost of supervising representation learning.