Skip to yearly menu bar Skip to main content


Poster

Interpretable Machine Learning for Datasets with Missing Values

Hayden McTavish · Jon Donnelly · Margo Seltzer · Cynthia Rudin

[ ]
Thu 12 Dec 11 a.m. PST — 2 p.m. PST

Abstract: Many important datasets contain samples that are missing one or more features. Maintaining the interpretability of machine learning models in the presence of this missing data is challenging. Singly or multiply imputing missing values complicates the model’s mapping from features to labels. On the other hand, reasoning on indicator variables that represent missingness introduces a potentially large number of additional terms, sacrificing sparsity. We solve these problems with M-GAM, a sparse generalized additive modeling approach that incorporates missingness indicators and their interaction terms while maintaining sparsity through $\ell_0$ regularization. We show that M-GAM provides similar or superior accuracy to prior methods while significantly improving sparsity relative to either imputation or naive inclusion of indicator variables.

Live content is unavailable. Log in and register to view live content