Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Machine Learning and the Physical Sciences

Missing Data Imputation for Galaxy Redshift Estimation

Kieran Luken


Abstract:

Astronomical data is full of holes. While there are many reasons for this missing data, the data can be randomly missing, caused by things like data corruptions or unfavourable observing conditions. We test some simple data imputation methods (Mean, Median, Minimum, Maximum and k-Nearest Neighbours (kNN)), as well as two more complex methods (Multivariate Imputation by using Chained Equation (MICE) and Generative Adversarial Imputation Network (GAIN)) against data where increasing amounts are randomly set to missing. We then use the imputed datasets to estimate the redshift of the galaxies, using the kNN and Random Forest ML techniques. We find that the MICE algorithm provides the lowest Root Mean Square Error and consequently the lowest prediction error, with the GAIN algorithm the next best.

Chat is not available.