Minority bias assessment in healthcare

Nowadays machine learning algorithms are used in every aspect of our daily lives from speech recognition, available in a lot of smartphones, and fraud prevention in finance. Specific attention is in the medical field, in which the need to collect high-quality data is fundamental in order to get precise results. Despite this, high-quality data is not enough to consider a result precise and good. Another important concept is Fairness, which can be seen as the absence of bias inside a dataset. The presence of bias in data could lead to wrong results, threatening directly the life of patients. For this reason, it turns out to be fundamental to find and study systems able to manage bias, evaluate their capabilities, and test them in the healthcare domain, creating a catalogue based on their capabilities and bringing together from a theoretical point of view the different measures of fairness adopted by each system. In this thesis, we focus on the study of a specific bias in the Healthcare field, Minority Bias. We start from a method already developed which suggests the step to follow to identify and measure this type of bias, and we conduct a theoretical and a practical study through the use of three systems able to measure and mitigate this bias, respectively Ranking Facts, AI Fairness 360, and FairLearn. The studies have been performed on three real-world datasets: two datasets on diabetes, the first in the diagnostic field, which goal is to predict diabetes in female patients, and the second one with the goal to study the therapy change for diabetic patients. The third dataset study instead heart disease. The result obtained has permitted us to validate the pipeline methodology on which this thesis is based, making a catalogue of the metrics used to identify this type of bias, understand if a pre-defined technique for mitigating this type of bias exists, and how to detect its presence in the diagnostic field.

Oggigiorno gli algoritmi sono usati in molti aspetti della nostra vita, particolare attenzione viene rivolta al campo medico, in cui la necessità di raccogliere dati di qualità è un aspetto fondamentale. Tuttavia non è solo la qualità dei dati ad essere importante ma anche la Fairness, che possiamo vedere come l'assenza di bias nei dati. La presenza di bias nei dati può portarci ad avvere dei risultati errati, compromettendo direttamente la salute dell'individuo. Per questo motivo, risulta fondamentale trovare e studiare sistemi in grado di gestire i bias, valutarne le capacità e testarli in ambito sanitario, creando un catalogo basato sulle loro capacità e che riunisca da un punto di vista teorico le diverse misure di fairness adottate da ciascun sistema. In questa tesi ci siamo concentrati sullo studio di un bias particolare nel campo dell'Healthcare, ossia il Minority Bias. E' stato condotto uno studio teorico su di esso, per poi studiarlo nella pratica attraverso tre sistemi, rispettivamente Ranking Facts, AI Fairness 360 e FairLearn. Gli studi condotti sono stati volti a studiare questo Bias su tre dataset, i primi due sulla malattia del diabete, il primo in campo diagnostico, il cui scopo è la predizione di diabete in pazienti di sesso femminile, il secondo ha lo scopo di valutare il ritorno di un individuo diabetico in ospedale con un conseguente cambio di dose, mentre il terzo riguardante malattie cardiovascolari. I risultati ottenuti ci hanno permesso di chiarire se esista una tecnica pre-definita da utilizzare per rimuovere questo tipo di Bias, come poterlo misurare in modo efficiente preso come contesto il campo diagnostico e validare la pipeline utilizzata per la rilevazione di questo tipo di bias.