Poster
in
Workshop: AI for Science: Progress and Promises
An "interpretable-by-design" neural network to decipher RNA splicing regulatory logic
Susan Liao · Mukund Sudarshan · Oded Regev
Keywords: [ functional genomics ] [ modeling biological systems ] [ interpretable AI ] [ RNA splicing ]
Artificial intelligence algorithms, in particular neural networks, capture complex quantitative relationships between input and output. However, as neural networks are typically black box, it is difficult to extract post-hoc insights on how they achieve their predictive success. Furthermore, they easily capture artifacts or biases in the training data, often fail to generalize beyond the datasets used for training and testing, and do not lead to new insights on the underlying processes. To enable scientific progress, machine learning models should not only accurately predict outcomes, but also describe how they arrived at their predictions. In recent years, neural networks have been applied to understanding biological processes, and specifically in deciphering RNA splicing, a fundamental process in the transfer of genomic information into functional biochemical products. Despite recent success using neural networks to predict splicing outcomes, understanding how specific RNA features dictates splicing outcomes remains an open challenge. The challenge is further underscored by the sensitivity of splicing logic, where almost all single nucleotide changes along an exon can lead to dramatic changes in splicing outcomes. Here we demonstrate that an "interpretable-by-design" model achieves predictive accuracy without sacrificing interpretability and captures a unifying decision-making logic. Although we designed our model to emphasize interpretability, its predictive accuracy is on par with state-of-the-art models. Importantly, the model revealed novel components of splicing logic, which we experimentally validated. To demonstrate the model's interpretability, we introduce a visualization that, for any given exon, allows us to trace and quantify the entire decision process from input sequence to output splicing prediction. The network's ability to quantify contributions of specific features to splicing outcomes for individual exons has considerable potential for a range of medical and biotechnology applications, including genome- or RNA-editing of target exons to correct splicing behavior or guiding rational design of RNA-based therapeutics like antisense oligonucleotides.