Extending PageRank algorithm in new contexts

Biblioteche e Archivi
POLITesi - Archivio digitale delle tesi di laurea e di dottorato

The PageRank algorithm or Google algorithm was introduced by Larry Page, one of the two founders of Google, in 1999; this algorithm is still partially used by Google in order to rank the webpages in the Google search engine. One of the interesting aspects of this algorithm is how to start from a really complex problem and end up with an effective but simple solution. Indeed the trademark Google reminds of a stratospheric number (googol=10100): the reason is that the scale of the problem that the search engines have to face and solve is enormous. Given a query of a random user the aim is to order the importance of all the pages related to that query. The algorithm uses a graph of connections between the nodes (webpages) based on the hyperlinks between the pages and it returns a score for each page; this score determines the position of the page in the final rank. The higher the score, the higher the page will be on the list given in output to the user. This work has the scope to apply PageRank in different contexts to prove that this algorithm is still valuable to rank a list of objects connected one another. The algorithm is integrated by using statistical tools to make it more powerful for each context analyzed. The final aim is to enrich the available tools that can be used to solve the so-called ranking problems.

L’algoritmo PageRank o algorithmo di Google è stato introdotto da Larry Page, uno dei fondatori di Google, nel 1999. Questo algoritmo è ancora parzialmente utilizzato nel motore di ricerca sviluppato da Google. Uno degli aspetti interessanti e innovativi di questo algoritmo sta nel fatto di risolvere un problema alla base molto complicato applicando concetti basilari della matematica e statistica. Difatti il nome Google deriva dal numero stratosferico, parte del suo significato (Googol=10100): la ragione sta nella complessità del problema che il motore di ricerca deve risolvere. Data una query di un utente casuale l’obiettivo è di trovare un ordine di importanza delle pagine relative alla query. L’algoritmo usa una rete diretta dove le connessioni tra i nodi (pagine web) è basata sugli hyperlink tra le pagine web; la classifica finale delle pagine web è determinata da uno score calcolato tramite l’algoritmo. Più elevato è lo score di una pagina e più alta sarà la posizione della pagina nella classifica. Questo lavoro ha l’obiettivo di applicare PageRank in diversi contesti per dimostrare che questo algoritmo ha ancora un valore per classificare una lista di oggetti. L’algoritmo viene integrato utilizzando strumenti di statistica applicata per renderlo adattabile ai diversi contesti analizzati. L’obiettivo finale è quello di arricchire con nuovi metodi i modelli già esistenti per classificare liste di oggetti.

Extending PageRank algorithm in new contexts

GEROSA, PAOLO

2021/2022

Abstract

The PageRank algorithm or Google algorithm was introduced by Larry Page, one of the two founders of Google, in 1999; this algorithm is still partially used by Google in order to rank the webpages in the Google search engine. One of the interesting aspects of this algorithm is how to start from a really complex problem and end up with an effective but simple solution. Indeed the trademark Google reminds of a stratospheric number (googol=10100): the reason is that the scale of the problem that the search engines have to face and solve is enormous. Given a query of a random user the aim is to order the importance of all the pages related to that query. The algorithm uses a graph of connections between the nodes (webpages) based on the hyperlinks between the pages and it returns a score for each page; this score determines the position of the page in the final rank. The higher the score, the higher the page will be on the list given in output to the user. This work has the scope to apply PageRank in different contexts to prove that this algorithm is still valuable to rank a list of objects connected one another. The algorithm is integrated by using statistical tools to make it more powerful for each context analyzed. The final aim is to enrich the available tools that can be used to solve the so-called ranking problems.

Scheda breve

Scheda completa

	Relatore
	
				CAMPI, ALESSANDRO
			
	Scuola / Dip.
	
				ING  - Scuola di Ingegneria Industriale e dell'Informazione
			
	Data
	
				20-dic-2022
			
	Anno accademico
	
				2021/2022
			
	Abstract in italiano
	
				L’algoritmo PageRank o algorithmo di Google è stato introdotto da Larry Page, uno
dei fondatori di Google, nel 1999. Questo algoritmo è ancora parzialmente utilizzato
nel motore di ricerca sviluppato da Google. Uno degli aspetti interessanti e innovativi
di questo algoritmo sta nel fatto di risolvere un problema alla base molto complicato
applicando concetti basilari della matematica e statistica. Difatti il nome Google deriva
dal numero stratosferico, parte del suo significato (Googol=10100): la ragione sta nella
complessità del problema che il motore di ricerca deve risolvere.
Data una query di un utente casuale l’obiettivo è di trovare un ordine di importanza delle
pagine relative alla query. L’algoritmo usa una rete diretta dove le connessioni tra i nodi
(pagine web) è basata sugli hyperlink tra le pagine web; la classifica finale delle pagine
web è determinata da uno score calcolato tramite l’algoritmo. Più elevato è lo score di
una pagina e più alta sarà la posizione della pagina nella classifica.
Questo lavoro ha l’obiettivo di applicare PageRank in diversi contesti per dimostrare che
questo algoritmo ha ancora un valore per classificare una lista di oggetti. L’algoritmo
viene integrato utilizzando strumenti di statistica applicata per renderlo adattabile ai diversi
contesti analizzati. L’obiettivo finale è quello di arricchire con nuovi metodi i modelli
già esistenti per classificare liste di oggetti.
			
	Appare nelle tipologie:
	
				Tesi di laurea Magistrale

File allegati

File	Dimensione	Formato
2022_12_Gerosa.pdf accessibile in internet per tutti Descrizione: Testo della tesi Dimensione 1.67 MB Formato Adobe PDF Visualizza/Apri	1.67 MB	Adobe PDF	Visualizza/Apri

I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10589/196116