Poster
in
Workshop: Shared Visual Representations in Human and Machine Intelligence
What can 5.17 billion regression fits tell us about artificial models of the human visual system?
Colin Conwell · Jacob Prince · George Alvarez · Talia Konkle
Rapid simultaneous advances in machine vision and cognitive neuroimaging present an unparalleled opportunity to assess the current state of artificial models of the human visual system. Here, we perform a large-scale benchmarking analysis of 72 modern deep neural network models to characterize with robust statistical power how differences in architecture and training task contribute to the prediction of human fMRI activity across 16 distinct regions of the human visual system. We find: one, that even stark architectural differences (e.g. the absence of convolution in transformers and MLP-mixers) have very little consequence in emergent fits to brain data; two, that differences in task have clear effects--with categorization and self-supervised models showing relatively stronger brain predictivity across the board; three, that feature reweighting leads to substantial improvements in brain predictivity, without overfitting -- yielding model-to-brain regression weights that generalize at the same level of predictivity to brain responses over 1000s of new images. Broadly, this work presents a lay-of-the-land for the emergent correspondences between the feature spaces of modern deep neural network models and the representational structure inherent to the human visual system.