Proximity rank join is the problem of finding the top-K combinations with the highest aggregate score in which the best combinations of objects coming from different services are sought, and each object is equipped with both a score and a real-valued feature vector. The proximity of the objects i.e. the geometry of the feature space plays a distinctive role in the computation of the overall score of a combination. Proximity rank join problem can capture many interesting scenarios e.g. finding similar documents in different data collections given a set of keywords, requesting similar images from different archives given a sample image, to name a few, if the notion of proximity is replaced with the notion of similarity. For this reason, the main goal of this thesis work is the incorporation of the cosine similarity measure into the proximity rank join algorithm. But this incorporation results in an optimization problem which is computationally demanding and complex in nature. Therefore our incorporation is achieved via solving an approximated version of the original optimization problem by making an assumption which we have found to be totally in line with the concept of top-K combinations. Moreover, the solution of our approximated optimization problem has excellent properties in that with the increase of dimension of the feature space (from 2D to 3D) the normalized (in terms of the maximum possible aggregate score) deviation between the true aggregate score and the approximated aggregate score tends to drop and like the true solution it also avoids the triviality,which is unrealistic, that occurs in the solutions of other approximated variants of the original optimization problem.

Proximity rank join based on cosine similarity

AHMMED, ASHEK
2009/2010

Abstract

Proximity rank join is the problem of finding the top-K combinations with the highest aggregate score in which the best combinations of objects coming from different services are sought, and each object is equipped with both a score and a real-valued feature vector. The proximity of the objects i.e. the geometry of the feature space plays a distinctive role in the computation of the overall score of a combination. Proximity rank join problem can capture many interesting scenarios e.g. finding similar documents in different data collections given a set of keywords, requesting similar images from different archives given a sample image, to name a few, if the notion of proximity is replaced with the notion of similarity. For this reason, the main goal of this thesis work is the incorporation of the cosine similarity measure into the proximity rank join algorithm. But this incorporation results in an optimization problem which is computationally demanding and complex in nature. Therefore our incorporation is achieved via solving an approximated version of the original optimization problem by making an assumption which we have found to be totally in line with the concept of top-K combinations. Moreover, the solution of our approximated optimization problem has excellent properties in that with the increase of dimension of the feature space (from 2D to 3D) the normalized (in terms of the maximum possible aggregate score) deviation between the true aggregate score and the approximated aggregate score tends to drop and like the true solution it also avoids the triviality,which is unrealistic, that occurs in the solutions of other approximated variants of the original optimization problem.
ING V - Facolta' di Ingegneria dell'Informazione
22-ott-2010
2009/2010
Tesi di laurea Magistrale
File allegati
File Dimensione Formato  
thesis_ashek_737737.pdf

accessibile in internet per tutti

Descrizione: Thesis text
Dimensione 5.67 MB
Formato Adobe PDF
5.67 MB Adobe PDF Visualizza/Apri

I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10589/3786