Proximity rank join is the problem of finding the top-K combinations with the highest aggregate score in which the best combinations of objects coming from different services are sought, and each object is equipped with both a score and a real-valued feature vector. The proximity of the objects i.e. the geometry of the feature space plays a distinctive role in the computation of the overall score of a combination. Proximity rank join problem can capture many interesting scenarios e.g. finding similar documents in different data collections given a set of keywords, requesting similar images from different archives given a sample image, to name a few, if the notion of proximity is replaced with the notion of similarity. For this reason, the main goal of this thesis work is the incorporation of the cosine similarity measure into the proximity rank join algorithm. But this incorporation results in an optimization problem which is computationally demanding and complex in nature. Therefore our incorporation is achieved via solving an approximated version of the original optimization problem by making an assumption which we have found to be totally in line with the concept of top-K combinations. Moreover, the solution of our approximated optimization problem has excellent properties in that with the increase of dimension of the feature space (from 2D to 3D) the normalized (in terms of the maximum possible aggregate score) deviation between the true aggregate score and the approximated aggregate score tends to drop and like the true solution it also avoids the triviality,which is unrealistic, that occurs in the solutions of other approximated variants of the original optimization problem.
Proximity rank join based on cosine similarity
AHMMED, ASHEK
2009/2010
Abstract
Proximity rank join is the problem of finding the top-K combinations with the highest aggregate score in which the best combinations of objects coming from different services are sought, and each object is equipped with both a score and a real-valued feature vector. The proximity of the objects i.e. the geometry of the feature space plays a distinctive role in the computation of the overall score of a combination. Proximity rank join problem can capture many interesting scenarios e.g. finding similar documents in different data collections given a set of keywords, requesting similar images from different archives given a sample image, to name a few, if the notion of proximity is replaced with the notion of similarity. For this reason, the main goal of this thesis work is the incorporation of the cosine similarity measure into the proximity rank join algorithm. But this incorporation results in an optimization problem which is computationally demanding and complex in nature. Therefore our incorporation is achieved via solving an approximated version of the original optimization problem by making an assumption which we have found to be totally in line with the concept of top-K combinations. Moreover, the solution of our approximated optimization problem has excellent properties in that with the increase of dimension of the feature space (from 2D to 3D) the normalized (in terms of the maximum possible aggregate score) deviation between the true aggregate score and the approximated aggregate score tends to drop and like the true solution it also avoids the triviality,which is unrealistic, that occurs in the solutions of other approximated variants of the original optimization problem.File | Dimensione | Formato | |
---|---|---|---|
thesis_ashek_737737.pdf
accessibile in internet per tutti
Descrizione: Thesis text
Dimensione
5.67 MB
Formato
Adobe PDF
|
5.67 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/3786