Oral
in
Workshop: Workshop on Distribution Shifts: New Frontiers with Foundation Models
Domain constraints improve risk prediction when outcome data is missing
Sidhika Balachandar · Nikhil Garg · Emma Pierson
Keywords: [ Distribution Shift ] [ selective labels ] [ Health ] [ domain constraint ] [ Bayesian model ] [ biomedicine ]
Machine learning models often predict the outcome resulting from a human decision. For example, if a doctor tests a patient for disease, will the patient test positive? A challenge is that the human decision censors the outcome data: we only observe test outcomes for patients doctors historically tested. Untested patients, for whom outcomes are unobserved, may differ from tested patients along observed and unobserved dimensions. We propose a Bayesian model to capture this setting whose purpose is to estimate risk for both tested and untested patients. To aid model estimation, we propose two domain-specific constraints which are plausible in health settings: a prevalence constraint, where the overall disease prevalence is known, and an expertise constraint, where the human decision-maker deviates from purely risk-based decision-making only along a constrained feature set. We show theoretically and on synthetic data that the constraints can improve parameter inference. We apply our model to a case study of cancer risk prediction, showing that the model can identify suboptimalities in test allocation and that the prevalence constraint increases the plausibility of inferences.