User-centric vs system-centric evaluation of recommender systems : a case of study

Recommender Systems (RS) are software tools that aim to reduce the information overload on the web by proposing possibly interesting items to the user. RS are widely used in many application domains such as e-commerce, tourism and movie recommendation. There are two main approaches to evaluating the quality of RS. System-centric, also called offline, is based on datasets of preferences and opinions on items previously collected from users, thus inexpensive and easy to reproduce. The other approach, called user-centric, also called online, measures the quality of the RS when real users interact with the system; it is an expensive approach to execute but leads to significant results. However, many works concluded that the results of the two approaches are often not correlated. We worked with Blacknut, a videogame startup company, to verify whether, in this particular domain, the two previously mentioned approaches correlate or not. As a methodological approach, we performed two studies to accomplish our goal. The first was a system-centric study based on accuracy error metrics and classification metrics to select promising candidates for the online test. The second was a user-centric study; we opted to perform an A/B test, the computer engineering version of randomized controlled trials. Different sets of users test different recommendation algorithms, providing performance measures by analyzing system logs. We concluded that accuracy error metrics are misleading in predicting the online performances of the algorithms. Instead, the ranking predicted through accuracy classification was reflected in the results of the user-centric study. The findings of this work enlarge the datasets of studies that compare the system-centric and the user-centric approaches and can also be used to design an RS in this domain. Assessing an RS's quality is an important open question; measuring the system's quality before doing expensive online experiments would be an important resource for the companies.

I Sistemi di Raccomandazione sono strumenti software in grado di ridurre il sovraccarico di informazioni nel web proponendo oggetti possibilmente interessanti all'utente. I sistemi di raccomandazione sono vastamente utilizzati in molti settori come l'e-commerce, il turismo e servizi di streaming musicali. Ci sono due approcci principali per valutare la qualità di un sistema di raccomandazione. System-centric, anche chiamato offline, basato su dataset di preferenze e opinioni su oggetti precedentemente collezionati sugli utenti, quindi poco costoso e facile da riprodurre. L'altro metodo chiamato user-centric, anche detto online, misura la qualità del sistema di raccomandazione quando veri utenti interagiscono con il sistema; è un metodo costoso da eseguire ma porta a risultati significativi. Tuttavia, molte ricerche hanno concluso che i due approcci precedentemente descritti non sono sempre correlati. Abbiamo lavorato con Blacknut, una startup di videogiochi, per verificare se in questo particolare dominio i due approcci di valutazioni sono correlati o meno. Come approccio metodologico, abbiamo eseguito due studi per raggiungere il nostro obiettivo. Il primo è stato uno studio system-centric basato su accuracy error metrics e classification error metrics per selezionare candidati promettenti da testare nel test online. Il secondo è stato uno studio user-centric; abbiamo optato per svolgere un A/B test, la versione informatica di studi randomizzati controllati. Differenti gruppi di utenti provano differenti algoritmi di raccomandazione, la performance finali sono valutate analizzando i log del sistema. Abbiamo concluso che le accuracy error metrics sono erronee nel predire le performance online degli algoritmi. Al contratrio, la classifica predetta dalle accuracy clasification metrics rispecchia le performance ottenute dagli algoritmi online. I risultati ottenuti in questa ricerca vanno ad arricchire la collezione di studi che comparano i metodi di valutazione system-centric con quelli user-centric e posso essere utilizzati per sviluppare un sistema di raccomandazione in questo settore, cioè del videogame. Predire le performance di un algoritmo di raccomandazione offline è un'importante domanda aperta, essere capaci di valutare la qualità di un algoritmo senza costosi esperimenti online sarebbe un’importante risorsa per le aziende.