Analisi di forma dei profili ChIP-Seq

Biblioteche e Archivi
POLITesi - Archivio digitale delle tesi di laurea e di dottorato

This Thesis, introduced in the epigenetics investigation, concerns the characterization of the shape of the data resulting from the ChIP-Sequencing analysis. In particular in this work we describe in detail the technique of collection and processing of data and proceed, then, to an analysis of these data. First, we provide a survey on the statistical model underlying the data, with the evaluation of the hypotheses proposed in the literature, and then we propose the formulation of a good generation model. Then, going into detail of the statistical analysis, this project investigates the possibility of distinguishing the different types of protein-DNA interaction by unsupervised classification based on the evaluation of the shape of the data. The algorithm at the basis of this clustering is the algorithm of k-mean evaluated, in the first examination, on shape indices suitably chosen to represent the data and, in a second time, on the overall structure of the data. In this last case the algorithm of the k-mean is appropriately adapted (k-mean alignment) to identify the components of variability interesting for the classification of functional data. In the last part of the Thesis we propose some biological interpretation of the results and we suggest some interesting ideas and objectives for the further development of the project.

Questa Tesi si inserisce nel contesto dell’indagine epigenetica e riguarda la caratterizza- zione di forma dei dati derivanti dall’analisi mediante ChIP-Sequencing dell’interazione proteina-DNA. In particolare in questo lavoro si descrive nel dettaglio la tecnica di raccolta ed elaborazione dei dati per procedere poi ad un’analisi statistica del segnale ottenuto. In primo luogo si prevede un’indagine sul modello statistico alla base dei dati, con la valutazione delle ipotesi proposte in letteratura e la formulazione di un buon modello di generazione. Entrando nel dettaglio dell’analisi statistica, poi, si indaga sulla possibilità di distinguere i diversi tipi di interazione proteina-DNA mediante tecniche di classificazione non supervisionata, basate sulla valutazione di forma dei dati. L’algoritmo alla base di questa clusterizzazione è l’algoritmo del k-mean valutato in primo esame su indici di forma opportunamente scelti per rappresentare i dati e in un secondo tempo sui dati funzionali nel loro complesso. A tale scopo è necessario introdurre il k-mean alignment, ovvero l’adattamento dell’algoritmo ai dati di tipo funzionale che ammette anche la registrazione dei dati in esame. La tesi si conclude con alcune considerazioni legate all’interpretazione biologica dei risultati proposti e con la presentazione di alcuni interessanti spunti di riflessione e obiettivi per l’ulteriore sviluppo del progetto.

Analisi di forma dei profili ChIP-Seq

PARODI, ALICE CARLA LUISA

2012/2013

Abstract

This Thesis, introduced in the epigenetics investigation, concerns the characterization of the shape of the data resulting from the ChIP-Sequencing analysis. In particular in this work we describe in detail the technique of collection and processing of data and proceed, then, to an analysis of these data. First, we provide a survey on the statistical model underlying the data, with the evaluation of the hypotheses proposed in the literature, and then we propose the formulation of a good generation model. Then, going into detail of the statistical analysis, this project investigates the possibility of distinguishing the different types of protein-DNA interaction by unsupervised classification based on the evaluation of the shape of the data. The algorithm at the basis of this clustering is the algorithm of k-mean evaluated, in the first examination, on shape indices suitably chosen to represent the data and, in a second time, on the overall structure of the data. In this last case the algorithm of the k-mean is appropriately adapted (k-mean alignment) to identify the components of variability interesting for the classification of functional data. In the last part of the Thesis we propose some biological interpretation of the results and we suggest some interesting ideas and objectives for the further development of the project.

Scheda breve

Scheda completa

	Relatore
	
				SECCHI, PIERCESARE
			
	Correlatore/i
	
				CREMONA, MARZIA
			
	Scuola / Dip.
	
				ING  - Scuola di Ingegneria Industriale e dell'Informazione
			
	Data
	
				3-ott-2013
			
	Anno accademico
	
				2012/2013
			
	Abstract in italiano
	
				Questa Tesi si inserisce nel contesto dell’indagine epigenetica e riguarda la caratterizza- zione di forma dei dati derivanti dall’analisi mediante ChIP-Sequencing dell’interazione proteina-DNA. In particolare in questo lavoro si descrive nel dettaglio la tecnica di raccolta ed elaborazione dei dati per procedere poi ad un’analisi statistica del segnale ottenuto. In primo luogo si prevede un’indagine sul modello statistico alla base dei dati, con la valutazione delle ipotesi proposte in letteratura e la formulazione di un buon modello di generazione. Entrando nel dettaglio dell’analisi statistica, poi, si indaga sulla possibilità di distinguere i diversi tipi di interazione proteina-DNA mediante tecniche di classificazione non supervisionata, basate sulla valutazione di forma dei dati. L’algoritmo alla base di questa clusterizzazione è l’algoritmo del k-mean valutato in primo esame su indici di forma opportunamente scelti per rappresentare i dati e in un secondo tempo sui dati funzionali nel loro complesso. A tale scopo è necessario introdurre il k-mean alignment, ovvero l’adattamento dell’algoritmo ai dati di tipo funzionale che ammette anche la registrazione dei dati in esame. La tesi si conclude con alcune considerazioni legate all’interpretazione biologica dei risultati proposti e con la presentazione di alcuni interessanti spunti di riflessione e obiettivi per l’ulteriore sviluppo del progetto.
			
	Tipo di documento
	
				Tesi di laurea Magistrale
			
	Appare nelle tipologie:
	
				Tesi di laurea Magistrale

File allegati

File	Dimensione	Formato
2013_10_Parodi.pdf accessibile in internet per tutti Descrizione: Testo della tesi Dimensione 6.14 MB Formato Adobe PDF Visualizza/Apri	6.14 MB	Adobe PDF	Visualizza/Apri

I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10589/85228