Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Generative AI and Biology (GenBio@NeurIPS2023)

SecretoGen: towards prediction of signal peptides for efficient protein secretion

Felix Teufel · Carsten Stahlhut · Jan Refsgaard · Henrik Nielsen · Ole Winther · Dennis Madsen

Keywords: [ secretion ] [ Protein ] [ sequence generation ] [ transformer ] [ protein design ] [ signal peptide ]


Abstract:

Signal peptides (SPs) are short sequences at the N terminus of proteins that control their secretion in all living organisms. Secretion is of great importance in biotechnology, as industrial production of proteins in host organisms often requires the proteins to be secreted. SPs have varying secretion efficiency that is dependent both on the host organism and the protein they are combined with. Therefore, to optimize production yields, an SP with good efficiency needs to be identified for each protein. While SPs can be predicted accurately by machine learning models, such models have so far shown limited utility for predicting secretion efficiency. We introduce SecretoGen, a generative transformer trained on millions of naturally occuring SPs from diverse organisms. Evaluation on a range of secretion efficiency datasets show that SecretoGen's perplexity has promising performance for selecting efficient SPs, without requiring training on experimental efficiency data.

Chat is not available.