Poster
in
Workshop: Interpretable AI: Past, Present and Future
Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models
Konstantin Donhauser · Gemma Moran · Aditya Ravuri · Kian Kenyon-Dean · Kristina Ulicna · Cian Eastwood · Jason Hartford
Dictionary learning (DL) has emerged as a powerful interpretability tool for large language models. By extracting known concepts (e.g., Golden-gate bridge) sparse DL elucidate a model's inner workings. In this work, we ask if DL can be used to discover unknown concepts, ultimately enabling modern approaches to scientific discovery. As a first step, we use DL algorithms to study microscopy foundation models trained on multi-cell image data, where little prior knowledge exists regarding known high-level concepts that should arise. In particular, we show that sparse dictionaries indeed extract biologically-meaningful concepts like cell type and genetic perturbation type. Furthermore, we propose to use PCA whitening on a control group for pre-processing and an orthogonal matching pursuit DL algorithm as an alternative to commonly used sparse auto-encoders.