Skip to yearly menu bar Skip to main content


Poster

The Multimodal Universe: Enabling Large-Scale Machine Learning with 70TBs of Astronomical Scientific Data

Eirini Angeloudi · Jeroen Audenaert · Micah Bowles · Benjamin M. Boyd · David Chemaly · Brian Cherinka · Ioana Ciucă · Miles Cranmer · Aaron Do · Matthew Grayling · Erin E. Hayes · Tom Hehir · Shirley Ho · Marc Huertas-Company · Kartheik Iyer · Maja Jablonska · Francois Lanusse · Henry Leung · Kaisey Mandel · Rafael Martínez-Galarza · Peter Melchior · Lucas Meyer · Liam Parker · Helen Qu · Jeff Shen · Michael Smith · Connor Stone · Mike Walmsley · John Wu

[ ]
Wed 11 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract:

We present the Multimodal Universe, a large-scale multimodal dataset of scientific astronomical data, compiled specifically to facilitate machine learning research. Overall, our dataset contains hundreds of millions of astronomical observations, constituting over 70TB of multi-channel and hyper-spectral images, spectra, multivariate time series, as well as a wide variety of associated scientific measurements and metadata. In addition, we include a range of benchmark tasks representative of standard practices for machine learning methods in astrophysics. This massive dataset will enable the development of large multi-modal models specifically targeted towards scientific applications. All codes used to compile the dataset, and a description of how to access the data is available at https://github.com/MultimodalUniverse/MultimodalUniverse

Live content is unavailable. Log in and register to view live content