Poster
in
Workshop: Machine Learning in Structural Biology
Estimating protein flexibility via uncertainty quantification of structure prediction models
Charlotte Sweeney · Nele Quast · Fabian Spoendlin · Yee Whye Teh
Deep learning architectures, such as AlphaFold2, have effectively solved the protein structure prediction problem however, they do not rigorously account for conformational variance in protein structures despite many proteins exhibiting flexible regions in which a single amino acid sequence may occupy a variety of conformations. In particular, using confidence metrics such as the pLDDT score, it is not readily possible to distinguish between regions of the protein structure where the prediction model is uncertain because the region is out-of-distribution or because the region is intrinsically flexible. Here, we use a novel approach to estimate protein flexibility via uncertainty quantification. Specifically, we reformulate the protein structure prediction problem as sampling a backbone function from a Gaussian process which enables us to cast flexibility estimation as aleatoric uncertainty quantification. We adapt the AlphaFold2 Structure Module architecture to produce such estimates of aleatoric uncertainty and compare these to existing proxies for conformational variance. We demonstrate the utility of our formalisation for approximating protein flexibility in a prediction framework, and our experiments demonstrate the promise of our method whilst emphasising the relationship between epistemic and aleatoric uncertainty in protein structure prediction.