Technology characterizes and facilitates our daily lives, but its pervasive use can result in the introduction or the exacerbation of social problems. Because of their intrinsic complexity, these issues require to be addressed from different but complementary perspectives, which are provided to us by two disciplines of very different nature: data science and sociology. Specifically, this thesis would like to be a bridge between the technical field of data analysis and a specific category of social problems, namely that of discrimination, and, in particular, gender discrimination. To move within this context, we use an approach that has data analysis as its starting point, and which finds in sociology a useful supporting instrument, as well as a source of requirements. We investigate in depth the sociological reasons behind gender discrimination in the specific society of our interest – the American one – introducing and exploring what is commonly referred as 'gender gap', and we carry out several experiments on data related to U.S. employees, focusing on the economic perspective (gender pay gap) but taking into account the different other facets of the problem. The main contributions of this thesis derive from the application of preprocessing techniques and the use of tools created with the aim of detecting bias in data, with which we try to understand which design choices have the greatest impact on the so-called 'fairness' of the results, and of which we highlight strengths and weaknesses, emphasizing the importance of a multidisciplinary approach to problems of this kind, that is essential to obtain information on the complex context in which data are embedded.
La tecnologia caratterizza e facilita la nostra vita quotidiana, ma il suo utilizzo pervasivo può tradursi nell'introduzione o nell'esacerbazione di problematiche sociali. A causa della loro intrinseca complessità, questi problemi devono essere affrontati da prospettive differenti ma tra loro complementari, che ci vengono fornite da due discipline di natura molto diversa: data science e sociologia. Nello specifico, questa tesi vorrebbe essere un ponte tra l'ambito tecnico dell'analisi dei dati e una specifica categoria di problemi sociali, ovvero quella della discriminazione, e, in particolare, della discriminazione di genere. Per muoverci in questo contesto, utilizziamo un approccio che ha come punto di partenza l'analisi dei dati, e che trova nella sociologia un utile strumento di supporto, nonché fonte di requisiti. Indaghiamo a fondo sulle ragioni sociologiche alla base della discriminazione di genere nella specifica società oggetto della nostra ricerca – quella americana – introducendo ed esplorando quello che viene comunemente definito 'gender gap', ed effettuiamo diversi esperimenti su dati relativi a impiegati statunitensi, concentrandoci sulla dimensione economica (gender pay gap) ma tenendo in considerazione le diverse altre sfaccettature del problema. I principali contributi di questa tesi derivano dall'applicazione di tecniche di data preprocessing e dall'utilizzo di strumenti creati con l'obiettivo di rilevare bias nei dati, con i quali cerchiamo di capire quali scelte progettuali impattino maggiormente sulla cosiddetta 'fairness' dei risultati, e dei quali mettiamo in luce punti di forza e di debolezza, sottolineando l'importanza di un approccio multidisciplinare a problemi di questo tipo, indispensabile per ottenere informazioni sul contesto complesso in cui i dati sono inseriti.
Gender discrimination in data analysis : a socio-technical approach
CORONA, RICCARDO
2020/2021
Abstract
Technology characterizes and facilitates our daily lives, but its pervasive use can result in the introduction or the exacerbation of social problems. Because of their intrinsic complexity, these issues require to be addressed from different but complementary perspectives, which are provided to us by two disciplines of very different nature: data science and sociology. Specifically, this thesis would like to be a bridge between the technical field of data analysis and a specific category of social problems, namely that of discrimination, and, in particular, gender discrimination. To move within this context, we use an approach that has data analysis as its starting point, and which finds in sociology a useful supporting instrument, as well as a source of requirements. We investigate in depth the sociological reasons behind gender discrimination in the specific society of our interest – the American one – introducing and exploring what is commonly referred as 'gender gap', and we carry out several experiments on data related to U.S. employees, focusing on the economic perspective (gender pay gap) but taking into account the different other facets of the problem. The main contributions of this thesis derive from the application of preprocessing techniques and the use of tools created with the aim of detecting bias in data, with which we try to understand which design choices have the greatest impact on the so-called 'fairness' of the results, and of which we highlight strengths and weaknesses, emphasizing the importance of a multidisciplinary approach to problems of this kind, that is essential to obtain information on the complex context in which data are embedded.File | Dimensione | Formato | |
---|---|---|---|
2021_10_Corona.pdf
accessibile in internet per tutti
Descrizione: Testo della tesi
Dimensione
3.4 MB
Formato
Adobe PDF
|
3.4 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/179070