Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Empowering Communities: A Participatory Approach to AI for Mental Health

Accessible and fair machine learning models for risk prediction of schizophrenia spectrum disorders

Marina Camacho · Polyxeni Gkontra · Angélica Atehortúa · Karim Lekadir


Abstract:

Schizophrenia spectrum disorders (SSZ) affect more than 24 million individuals worldwide. They present an acute onset of psychotic symptoms such as delusions, hallucinations, perceptual disturbances, and severe disruption of ordinary behavior which affect the wellbeing of individuals. Despite recent advanced in risk prediction models, there remains important gaps in the literature, particularly a lack of evaluation with large samples and external datasets, as well as concerns regard potential bias and discrimination. Furthermore, the current state-of-the art risk models are based on electronic health records, electroencephalograms, and genetic data, which are acquired in medical centres using expensive equipment, hence limiting widespread access to such tools by the general population. Hence, novel fair models to identify individuals at high risk and modifiable risk factors are essential to improve risk prediction of SSZ.To tackle these limitations, we developed and validated a novel, accessible and fair ML model for risk prediction of SSZ. From UK Biobank, a large longitudinal cohort, 591 participants who were diagnosed with schizophrenia, schizotypal and delusional disorders after the baseline assessment visit, were identified and included in our study. An equal number of healthy participants were selected as the control group by matching age and sex using propensity scores. This resulted in a total of 1182 participants being selected for our study; 1064 participants from 18 of the 22 UK Biobank assessment centers were used in nested cross-validation, and 306 participants from the remaining four centers were selected for external validation. We considered data from the participants’ baseline visit and selected 198 factors related to life course exposures, blood biochemistry and haematology. Subsequently, we performed data imputation to account for missing patient data. We evaluated different machine learning models to identify individuals at risk of schizophrenia spectrum disorders after the baseline visit: Logistic Regression, Support Vector Machines, Random Forest, AdaBoost and XGBoost. We assessed models’ performance in terms of AUC, F1-Score, precision, and sensitivity. Moreover, we evaluated the fairness of the best performing models by means of statistical parity difference and disparate impact ratio to identify and mitigate potential biases related to ethnicity, sex, birth, education and material deprivation. We interpreted the results by estimating feature importance using the SHapley Additive exPlanations (SHAP) values.Our results demonstrate that machine learning models based on accessible exposome variables such as Townsend deprivation and diet, can reliably identify individuals at risk of schizophrenia, schizotypal and delusional disorders. Haematological data slightly improve the results in terms of accuracy. For the task at hand, XGBoost outperforms other models with the best fair model achieving an AUC of 0.822 and 0.796 in internal and external validation cohorts, respectively. These preliminary results show promise for further investigation of accessible and fair ML models in mental health that will benefit the general population across various ethnic, sex, age and socio-economics groups.

Chat is not available.