Skip to yearly menu bar Skip to main content


Poster
in
Workshop: AIM-FM: Advancements In Medical Foundation Models: Explainability, Robustness, Security, and Beyond

Learning biologically relevant features in a pathology foundation model using sparse autoencoders

Nhat Le · Neel Patel · Ciyue Shen · Blake Martin · Alfred Eng · Chintan Shah · Sean Grullon · Dinkar Juyal


Abstract:

Pathology plays an important role in disease diagnosis, treatment decision-making and drug development. Previous works on interpretability for machine learning models on pathology images have revolved around methods such as attention value visualization and deriving human-interpretable features from model heatmaps. Mechanistic interpretability in an emerging area of model interpretability that focuses on reverse-engineering neural networks. Sparse Autoencoders (SAEs) have emerged as a promising direction in terms of extracting monosemantic features from model activations. In this work, we train a Sparse Autoencoder on the embeddings of a pathology pretrained foundation model.We discover an interpretable sparse representation of biological concepts within the model embedding space. We perform an investigation into how these representations are associated with quantitative human-interpretable features. Our work paves the way for further exploration around interpretable feature dimensions and their utility for medical and clinical applications.

Chat is not available.