Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Generative AI and Biology (GenBio@NeurIPS2023)

Machine learning derived embeddings of bulk multi-omics data enable clinically significant representations in a pan-cancer cohort

Sanjay Nagaraj · ZACHARY MCCAW · Theofanis Karaletsos · Daphne Koller · Anna Shcherbina

Keywords: [ omics ] [ co-embedding ] [ multi-ome ] [ variational autoencoder ]


Abstract:

Bulk multiomics data provides a comprehensive view of tissue biology, but datasets rarely contain matched transcriptomics and chromatin accessibility data for a given sample. Furthermore, it is difficult to identify relevant genetic signatures from the high-dimensional, sparse representations provided by omics modalities. Machine learning (ML) models have the ability to extract dense, information-rich denoised representations from omics data, which facilitate finding novel genetic signatures. To this end, we develop and compare generative ML models through an evaluation framework that examines the biological and clinical relevance of the underlying latent embeddings produced. We focus our analysis on pan-cancer oncology data from a set of 21 diverse cancer metacohorts across three datasets. Our best performing models show strong clinical and biological signals and improved performance over traditional baselines.

Chat is not available.