4 minute read time.
Shuvechchha Ghimire points out ethnic minorities' issue regarding machine learning in medical technology and healthcare systems. She is a graduate in Electronics and Computer Engineering from the University of Brighton and is also the founder of UNITEC Solutions Ltd which provides customised enterprise automation solutions.Machine Learning (ML) facilitated rapid technological intervention in Covid-19 vaccine development, disease prognosis, diagnosis, treatment, and clinical workflow. ML based assistive technologies saved valuable time and resources in predicting the deterioration of infected patients. It has been widely used, likewise, in detecting cancerous cells and reconstructing images in procedures like endoscopy. Moving forward, ML has the potential to provide tailored healthcare to individuals and support remote healthcare services in rural and low-income zones.
ML based models however can be susceptible to biases. Biases can be introduced in all stages of the ML design: data collection, data annotation, model training, deployment, and testing. 
  1. Complex socio-economic factors

A survey conducted among 17 million adults in the England reported that ethnic minorities were at greater risks of infection, hospitalisation, and death in the second wave of Covid-19 than the first. Using health data from 40% of GPs in England, the study suggested that residents with an Indian, Pakistani, or Bangladeshi background were more likely to test positive, need hospital treatment and lose lives.  Similarly, a study published in MedTxiv indicated that young females of lower income, lower education level and an ethnic minority background were less likely to vaccinate in the UK.
These patterns demonstrate complex socio-economic factors impacting access, literacy, and uptake of health services across the ethnic subpopulation. If used in machine learning (ML) models, these datasets can result in insufficient validation, inconsistent reliability, and poor generalisability. These ML model in return can manifest systemic inequalities and marginalisation, disproportionately impacting the minorities.

  1. Data-related vulnerabilities

Oversampling and under sampling of potentially dispersed heterogenous data can distort the corresponding ML model. Disparities in living and working conditions, differential access to advanced wearables and insurance etc can lead to poorer population having less detailed health data when compared to the rich. Any ML training conducted with an unequal dataset can distort output for the two subpopulations.
Similarly, inter-relations of symptoms create complex datasets. Heterogenous data from genomics, medical procedures, social media, and environment are used for contextual and causal awareness of a patient’s profile. These data, collected in silos can make data standardisation and exchange difficult. This creates an additional complication while deriving a fair ML model.

  1. Manifestation of professional and personal bias

To derive a causal model of a patient’s disease, cross-department collaboration is required. Complex biases can be introduced if standardised metrices are not used in the process. For instance, in radiology, multiple structured and unstructured data are collected, including images, notes by radiologists and patient stats. Radiology reports prepared by a senior radiologist can be different to that prepared by a less-experienced radiologist. Observations of two radiologist can differ because of historic norms, unconscious bias, and geographical region of practice. As such, notes prepared by the two can create inconsistent ML models. Automatic tagging and description of medical images, disease classification and automatic report generation are few other ML-based applications that are differentially impacted by the deep-rooted biases of healthcare personnel.
AI developers, designers and researchers can also have intrinsic biases which can influence the choices they make in the ML model development. Institutional racism and unconscious prejudice can lead to the integration of discrimination in ML models and the corresponding healthcare products.

  1. Security and robustness  

One cannot rule out cyber-attacks when designing data-driven healthcare products. Attacks can be in the form of unrepresentative training data targeted to disadvantage individuals belonging to a particular racial, religious, ethnical, and socio-economic background. While proxies and context-aware reinforcement learning models have been identified to prevent such attacks, a secure ML model design is not a one-off process. As such, complexity arise in determining qualifiable and quantifiable metrices through which continued fairness can be ensured.
A lack of standardised framework for fairness in healthcare ML suggests that current advancements might not be ready to guarantee equitable healthcare to ethnic minorities. If ethnic minorities do not question these ML intrinsic biases, they risk augmenting the existing inequalities for years to come. Perhaps a way forward is to engage end-consumers of all ethnic groups in bias detection and mitigation. Transparent and interpretable ML models should be the norm in healthcare moving forward.