This paper focuses on analyzing some of the ethical aspects that are encountered in the general process of data extraction. The starting point of our research was a comparison between two notable features that are found in data extraction: fairness and diversity. One of the major findings of such comparison is that in some cases the concept of diversity, when sufficiently forced, can be used as the very definition of fairness. After having thoroughly described in a formal way these two concepts, the relative problems and the solutions faced, complex networks will be analyzed, specifically the diversity metrics that we can calculate starting from a network. The purpose for which we will use these measures is to classify the nodes that make up the networks, based on the values of the metrics that were extracted during the creation of the networks. The aim of this research is to use classification models, and, in particular, a logistic regression model and a random forest based model, in order to demonstrate the presence of a correlation between the metrics of a complex network and other features such as diversity and fairness, typical of data extraction. In conclusion the results obtained from the first classification method, the logistic regression, will be highlighted, being a simpler model for the interpretation of the achieved data.
In questo elaborato si è deciso di considerare ed analizzare alcuni aspetti etici che si incontrano durante l’estrazione di dati. Più dettagliatamente, il punto di partenza della nostra ricerca è stato un confronto tra due concetti, quello di fairness e quello di diversity nell’estrazione di dati e in particolare che, se forzato, il concetto di diversity può essere usato come definizione di fairness. Dopo aver abbondantemente discusso, e provato a definire in maniera formale questi due concetti, i relativi problemi e le soluzioni affrontate, verranno analizzate le reti complesse, nello specifico le metriche di diversity che possiamo calcolare a partire da tale tipo di rete. Lo scopo per cui noi utilizzeremo queste misure è quello di effettuare una classificazione dei nodi che compongono una rete, basandoci sui valori delle metriche che si sono estratte durante la sua creazione. L’obiettivo di questa ricerca è utilizzare modelli di classificazione, nello specifico, un modello di regressione logistica e un modello basato sui random forest, per dimostrare la presenza di una correlazione tra le metriche tipiche dei complex network, e le misure di diversity e fairness, tipiche dell’estrazione dei dati. Verranno quindi discussi i risultati ottenuti grazie al primo metodo di classificazione, la regressione logistica, essendo un modello più semplice per l’interpretazione dei dati raggiunti.
Un'applicazione delle metriche delle reti complesse per valutare la diversity
CATTIVELLI, CAROLINA
2017/2018
Abstract
This paper focuses on analyzing some of the ethical aspects that are encountered in the general process of data extraction. The starting point of our research was a comparison between two notable features that are found in data extraction: fairness and diversity. One of the major findings of such comparison is that in some cases the concept of diversity, when sufficiently forced, can be used as the very definition of fairness. After having thoroughly described in a formal way these two concepts, the relative problems and the solutions faced, complex networks will be analyzed, specifically the diversity metrics that we can calculate starting from a network. The purpose for which we will use these measures is to classify the nodes that make up the networks, based on the values of the metrics that were extracted during the creation of the networks. The aim of this research is to use classification models, and, in particular, a logistic regression model and a random forest based model, in order to demonstrate the presence of a correlation between the metrics of a complex network and other features such as diversity and fairness, typical of data extraction. In conclusion the results obtained from the first classification method, the logistic regression, will be highlighted, being a simpler model for the interpretation of the achieved data.File | Dimensione | Formato | |
---|---|---|---|
2019_04_Cattivelli.pdf
accessibile in internet per tutti
Dimensione
2.75 MB
Formato
Adobe PDF
|
2.75 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/147957