Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Gaze Meets ML

FoVAE: Reconstructive Foveation as a Self-Supervised Variational Inference Task for Visual Representation Learning

Ivan Vegner · Siddharth N · Leonidas Doumas

Keywords: [ variational autoencoder ] [ foveation ] [ reconstruction ] [ predictive coding ]


Abstract:

We present the first steps toward a model of visual representation learning driven by a self-supervised reconstructive foveation mechanism. Tasked with looking at one visual patch at a time while reconstructing the current patch, predicting the next patch, and reconstructing the full image after a set number of timesteps, FoVAE learns to reconstruct images from the MNIST and Omniglot datasets, while inferring high-level priors about the whole image. In line with theories of Bayesian predictive coding in the brain and prior work on human foveation biases, the model combines bottom-up input processing with top-down learned priors to reconstruct its input, choosing foveation targets that balance local feature predictability with global information gain. FoVAE is able to transfer its priors and foveation policy across datasets to reconstruct samples from untrained datasets in a zero-shot transfer-learning setting. By showing that robust and domain-general policies of generative inference and action-based information gathering emerge from simple biologically-plausible inductive biases, this work paves the way for further exploration of the role of foveation in visual representation learning.

Chat is not available.