Talk
in
Workshop: NeurIPS'24 Workshop on Causal Representation Learning
Missing Data with ? and 0 Missingness Tokens: Identification and Estimation (Invited Talk by Ilya Shpitser)
Missing data is a ubiquitous issue in applied data science problems of all types. I show that many classical missing data models described in the literature have full data distributions that factorize with respect to a directed acyclic graph (DAG). I show that all models of this type, including many MNAR models, have a simple identifiability characterization, leading to a natural maximum likelihood plug-in estimation strategy. In addition, I describe a semi-parametric estimator based on influence functions derived in a supermodel of all identifiable DAG missing data models termed the ``self-censoring model.''
Finally, I describe a challenging extension of missing data problems where the missingness token is the value 0, rather than a special token such as ``?''. In these missingness via zero inflation problems, parameters in even very simple analogues of MCAR models are not identified. However, I show that a modification of the Kuroki-Pearl effect restoration approach, along with the presence of an informative proxy of a missingness indicator allows sharp bounds on target parameters to be derived. I illustrate this approach via an application of estimating rates of central line-associated bloodstream infections (CLABSIs) in data where their prevalence is undercounted.