Un'applicazione delle metriche delle reti complesse per valutare la diversity

Biblioteche e Archivi
POLITesi - Archivio digitale delle tesi di laurea e di dottorato

This paper focuses on analyzing some of the ethical aspects that are encountered in the general process of data extraction. The starting point of our research was a comparison between two notable features that are found in data extraction: fairness and diversity. One of the major findings of such comparison is that in some cases the concept of diversity, when sufficiently forced, can be used as the very definition of fairness. After having thoroughly described in a formal way these two concepts, the relative problems and the solutions faced, complex networks will be analyzed, specifically the diversity metrics that we can calculate starting from a network. The purpose for which we will use these measures is to classify the nodes that make up the networks, based on the values of the metrics that were extracted during the creation of the networks. The aim of this research is to use classification models, and, in particular, a logistic regression model and a random forest based model, in order to demonstrate the presence of a correlation between the metrics of a complex network and other features such as diversity and fairness, typical of data extraction. In conclusion the results obtained from the first classification method, the logistic regression, will be highlighted, being a simpler model for the interpretation of the achieved data.

In questo elaborato si è deciso di considerare ed analizzare alcuni aspetti etici che si incontrano durante l’estrazione di dati. Più dettagliatamente, il punto di partenza della nostra ricerca è stato un confronto tra due concetti, quello di fairness e quello di diversity nell’estrazione di dati e in particolare che, se forzato, il concetto di diversity può essere usato come definizione di fairness. Dopo aver abbondantemente discusso, e provato a definire in maniera formale questi due concetti, i relativi problemi e le soluzioni affrontate, verranno analizzate le reti complesse, nello specifico le metriche di diversity che possiamo calcolare a partire da tale tipo di rete. Lo scopo per cui noi utilizzeremo queste misure è quello di effettuare una classificazione dei nodi che compongono una rete, basandoci sui valori delle metriche che si sono estratte durante la sua creazione. L’obiettivo di questa ricerca è utilizzare modelli di classificazione, nello specifico, un modello di regressione logistica e un modello basato sui random forest, per dimostrare la presenza di una correlazione tra le metriche tipiche dei complex network, e le misure di diversity e fairness, tipiche dell’estrazione dei dati. Verranno quindi discussi i risultati ottenuti grazie al primo metodo di classificazione, la regressione logistica, essendo un modello più semplice per l’interpretazione dei dati raggiunti.

Un'applicazione delle metriche delle reti complesse per valutare la diversity

CATTIVELLI, CAROLINA

2017/2018

Abstract

This paper focuses on analyzing some of the ethical aspects that are encountered in the general process of data extraction. The starting point of our research was a comparison between two notable features that are found in data extraction: fairness and diversity. One of the major findings of such comparison is that in some cases the concept of diversity, when sufficiently forced, can be used as the very definition of fairness. After having thoroughly described in a formal way these two concepts, the relative problems and the solutions faced, complex networks will be analyzed, specifically the diversity metrics that we can calculate starting from a network. The purpose for which we will use these measures is to classify the nodes that make up the networks, based on the values of the metrics that were extracted during the creation of the networks. The aim of this research is to use classification models, and, in particular, a logistic regression model and a random forest based model, in order to demonstrate the presence of a correlation between the metrics of a complex network and other features such as diversity and fairness, typical of data extraction. In conclusion the results obtained from the first classification method, the logistic regression, will be highlighted, being a simpler model for the interpretation of the achieved data.

Scheda breve

Scheda completa

	Relatore
	
				TANCA, LETIZIA
			
	Correlatore/i
	
				AZZALINI, DAVIDE
			
	Scuola / Dip.
	
				ING  - Scuola di Ingegneria Industriale e dell'Informazione
			
	Data
	
				16-apr-2019
			
	Anno accademico
	
				2017/2018
			
	Abstract in italiano
	
				In questo elaborato si è deciso di considerare ed analizzare alcuni aspetti etici che si incontrano durante l’estrazione di dati. Più dettagliatamente, il punto di partenza della nostra ricerca è stato un confronto tra due concetti, quello di fairness e quello di diversity nell’estrazione di dati e in particolare che, se forzato, il concetto di diversity può essere usato come definizione di fairness. 
Dopo aver abbondantemente discusso, e provato a definire in maniera formale questi due concetti, i relativi problemi e le soluzioni affrontate, verranno analizzate le reti complesse, nello specifico le metriche di diversity che possiamo calcolare a partire da tale tipo di rete. 
Lo scopo per cui noi utilizzeremo queste misure è quello di effettuare una classificazione dei nodi che compongono una rete, basandoci sui valori delle metriche che si sono estratte durante la sua creazione. L’obiettivo di questa ricerca è utilizzare modelli di classificazione, nello specifico, un modello di regressione logistica e un modello basato sui random forest, per dimostrare la presenza di una correlazione tra le metriche tipiche dei complex network, e le misure di diversity e fairness, tipiche dell’estrazione dei dati. 
Verranno quindi discussi i risultati ottenuti grazie al primo metodo di classificazione, la regressione logistica, essendo un modello più semplice per l’interpretazione dei dati raggiunti.
			
	Tipo di documento
	
				Tesi di laurea Magistrale
			
	Appare nelle tipologie:
	
				Tesi di laurea Magistrale

File allegati

File	Dimensione	Formato
2019_04_Cattivelli.pdf accessibile in internet per tutti Dimensione 2.75 MB Formato Adobe PDF Visualizza/Apri	2.75 MB	Adobe PDF	Visualizza/Apri

I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10589/147957