In organizations, process models enable informed decision-making by formalizing existing procedures. However, better alignment between process models is needed to improve interactions with the large collections that contain them. Currently, activity matching approaches can be used to examine the activities contained in processes and establish possible alignments. In this work, we study how to address the recall, scalability, and cost issues affecting current approaches by using human intelligence, specifically that found in large and varied groups of individuals, called crowds. Our main research questions are the following: Can crowdsourcing based approaches be used to perform or support activity matching? How do such approaches perform with respect to identified correspondences, scalability, and costs? We present direct and indirect types of approaches. Direct approaches ask the crowd to identify corresponding activities; indirect ones, to enrich activities with relevant data, used to support activity matching algorithms. To evaluate our approaches, we implement four crowdsourcing experiments and develop data analysis algorithms to study the collected datasets. We also implement an activity matching algorithm based on cosine similarity. Our results indicate that crowdsourcing based approaches are capable of performing or supporting activity matching at scale; however, both performance and costs are significantly influenced by the designs and strategies adopted. In direct approaches, recall is higher than precision since the crowd identifies numerous correspondences, but with low accuracy. Yet, even false positives may provide valuable knowledge in the form of plausible activity pairs. In indirect approaches, precision is higher than recall since algorithms are more accurate but less able to identify correspondences. Also, care must be taken in the strategies used to select the data supplied to algorithms to prevent low performances. Crowdsourcing properties allow both types of approaches to scale; however, cost-effective and less tasking approaches, such as indirect approaches or simpler direct ones, are preferable for large scale activity matching. Finally, activity matching costs can be significantly reduced by using crowdsourcing based approaches. In particular, indirect approaches represent the least expensive option. Also, the design strategies adopted in our direct approaches achieve considerable cost reductions with respect to previous, similar approaches.
Nelle aziende, i modelli dei processi aziendali formalizzano le procedure esistenti e permettono di prendere decisioni informate. Tuttavia, è necessario un migliore allineamento tra questi modelli per facilitare l'interazione con le collezioni di grandi dimensioni in cui sono contenuti. Attualmente, approcci di activity matching possono essere usati per esaminare le attività contenute nei processi e stabilire possibili allineamenti. In questo lavoro, studiamo come affrontare i problemi di richiamo, scalabilità e costi di cui soffrono gli attuali approcci usando l'intelligenza umana, precisamente quella disponibile in gruppi grandi e variegati di individui, chiamati crowd. Ci concentriamo sulle seguenti domande di ricerca: È possibile utilizzare approcci basati sul crowdsourcing per svolgere o supportare il lavoro di activity matching? Quali sono le prestazioni di questi approcci rispetto a corrispondenze identificate, scalabilità e costi? Nel lavoro mostriamo approcci di tipo diretto e indiretto. Gli approcci diretti chiedono alla crowd di identificare corrispondenze tra attività; quelli indiretti, di arricchire delle attività con informazioni pertinenti, usate per supportare algoritmi di activity matching. Per valutare i nostri approcci, implementiamo quattro esperimenti di crowdsourcing e sviluppiamo algoritmi di analisi dei dati per studiare i dataset raccolti. Implementiamo inoltre un algoritmo di activity matching basato su cosine similarity. I nostri risultati mostrano che gli approcci basati sul crowdsourcing sono in grado di svolgere o supportare il lavoro di activity matching su larga scala; tuttavia, sia le prestazioni che i costi sono significativamente influenzati dai design e dalle strategie adottate. Per gli approcci diretti il richiamo è più alto della precisione dato che la crowd identifica numerose corrispondenze, ma in maniera poco accurata. Tuttavia, anche i falsi positivi possono contenere informazioni utili sotto forma di coppie di attività verosimili. Per gli approcci indiretti la precisione è più alta del richiamo dato che gli algoritmi sono più precisi ma meno capaci di identificare le corrispondenze. Inoltre, va prestata attenzione alle strategie usate per scegliere i dati forniti agli algoritmi in modo da prevenire scarse prestazioni. Le proprietà del crowdsourcing consentono a entrambi i tipi di approccio di funzionare su larga scala; tuttavia, approcci più economici e meno complessi, come quelli indiretti o quelli più semplici tra i diretti, sono preferibili per eseguire activity matching su larga scala. Infine, i costi associati all'activity matching possono essere ridotti in maniera significativa facendo uso di approcci basati sul crowdsourcing. In particolare, gli approcci indiretti rappresentano l'opzione meno costosa. Inoltre, i design adottati nei nostri approcci diretti permettono di ridurre considerevolmente i costi rispetto ad approcci simili sviluppati in precedenza.
Cost-effective and scalable activity matching using crowdsourcing
SCIBONA, EDOARDO
2016/2017
Abstract
In organizations, process models enable informed decision-making by formalizing existing procedures. However, better alignment between process models is needed to improve interactions with the large collections that contain them. Currently, activity matching approaches can be used to examine the activities contained in processes and establish possible alignments. In this work, we study how to address the recall, scalability, and cost issues affecting current approaches by using human intelligence, specifically that found in large and varied groups of individuals, called crowds. Our main research questions are the following: Can crowdsourcing based approaches be used to perform or support activity matching? How do such approaches perform with respect to identified correspondences, scalability, and costs? We present direct and indirect types of approaches. Direct approaches ask the crowd to identify corresponding activities; indirect ones, to enrich activities with relevant data, used to support activity matching algorithms. To evaluate our approaches, we implement four crowdsourcing experiments and develop data analysis algorithms to study the collected datasets. We also implement an activity matching algorithm based on cosine similarity. Our results indicate that crowdsourcing based approaches are capable of performing or supporting activity matching at scale; however, both performance and costs are significantly influenced by the designs and strategies adopted. In direct approaches, recall is higher than precision since the crowd identifies numerous correspondences, but with low accuracy. Yet, even false positives may provide valuable knowledge in the form of plausible activity pairs. In indirect approaches, precision is higher than recall since algorithms are more accurate but less able to identify correspondences. Also, care must be taken in the strategies used to select the data supplied to algorithms to prevent low performances. Crowdsourcing properties allow both types of approaches to scale; however, cost-effective and less tasking approaches, such as indirect approaches or simpler direct ones, are preferable for large scale activity matching. Finally, activity matching costs can be significantly reduced by using crowdsourcing based approaches. In particular, indirect approaches represent the least expensive option. Also, the design strategies adopted in our direct approaches achieve considerable cost reductions with respect to previous, similar approaches.File | Dimensione | Formato | |
---|---|---|---|
2018_04_Scibona.pdf
accessibile in internet per tutti
Descrizione: Testo della tesi
Dimensione
4.68 MB
Formato
Adobe PDF
|
4.68 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/140090