Exploiting LMs for clickbait detection in online news recommendation

As online news consumption grows, effective news recommendation is increasingly essential, balancing user interests with ethical considerations. However, news recommendation is less studied compared to movie or product recommendation, partly due to limited benchmark datasets. Many technical challenges are faced in this area, going from the rapid decay of news relevance to the use news' textual content. Clickbait Detection, intended as the identification of those articles in which the preview of the article differs from its content, is among such challenges. News Recommendation was the topic of the ACM Recsys Challenge 2024 organized by Ekstra-Bladet. In such context, our purpose as Politecnico di Milano's team was to develop a winning solution at predicting the user click in a given impression. This thesis will go through all the work that we conducted as a team to reach the solution that allowed us to win the academic leaderboard, detailing the data analysis and the features we obtained from the dataset, focusing on the ones that lead to a more significant performance boost. We will also dedicate attention to the models that we used and how we combined their predictions to maximize the final accuracy. Later we will focus on 4 research questions that emerged after the Challenge completion RQ1) How do the recommendation lists change when we encode different textual parts of the articles and feed them to the recommendation algorithms?, RQ2) Does the previous results change when we only consider correct recommendation?, RQ3) Are the results influenced by the length of the encoded textual part?, and RQ4) How does the choice of the encoded part impact the overall recommendation trends? To answer these research questions we compared the recommendation lists obtained by 3 recommendation algorithms when fed with the embeddings of different parts of the articles, obtained by using the Language Models as Sentence Encoders. Firstly, our goal is to verify that recommendations made using title encoding differ from those using content encoding, if this happens we could infer the presence of a clickbait phenomenon (RQ1). Secondly, we want to verify that the results of the previous point depend on the semantic difference between title and content and not on other factors such as the modest accuracy of the models (RQ2) or the length of the encoded components(RQ3). Finally, we want to identify individual candidate clickbait articles, recognizing them in those articles that are globally recommended significantly more times when title encoding is used compared to when content encoding is used(RQ4). For RQ1 we compared the correlation scores of the recommendation lists, to verify if the recommendation algorithms that leverage the title encoding are less correlated with the ones that use other article's parts. At the end of the first experiment we found that our hypothesis held and therefore Language Models' strong semantic understanding can be used to detect clickbait. To test RQ2 we repeated the previous analysis considering only the distribution of correct recommendations, since the results were comparable to those of the first experiment we understood that the recommender accuracy does not compromise the results of the previous point. In the third experiment we verified if there was a linear or monotonic behavior between the correlation scores and the encoded parts' lengths. The experiment showed there was not, confirming that the differences found in the previous points are solely to be considered as a result of the semantic gap between the title and the rest. Finally, in the last experiment, we understood how the general recommendation trends vary, noting some exceptional behavior by the recommenders that use the encoding of the article. In other words we found that the choice of the encoded parts influences not only the ranking of the articles in the same recommendation lists, but also the global behavior of the recommender systems that tends to privilege what we consider candidate clickbait articles.

Con la crescente diffusione del consumo di notizie online, una efficace racommandazione delle notizie diventa sempre più essenziale, bilanciando gli interessi degli utenti con questioni etiche. Tuttavia, la raccomandazione di notizie è meno studiata rispetto a quella di film o prodotti, in parte a causa della limitata disponibilità di dataset di riferimento. Questo ambito presenta numerose difficoltà tecniche, che vanno dal rapido decadimento della rilevanza delle notizie all'uso del contenuto testuale degli articoli. Tra queste sfide vi è la rilevazione di articoli clickbait, intesa come l’identificazione di quegli articoli in cui l’anteprima differisce dal contenuto effettivo. La raccomandazione di notizie è stata il tema della ACM RecSys Challenge 2024, organizzata da Ekstra-Bladet. In questo contesto, il nostro obiettivo come team del Politecnico di Milano era sviluppare una soluzione vincente per prevedere il click degli utenti su una determinata impression. Questa tesi descriverà il lavoro che abbiamo svolto come team per raggiungere la soluzione che ci ha permesso di vincere la leaderboard accademica, dettagliando l'analisi dei dati e le feature che abbiamo estratto dal dataset, con un focus particolare su quelle che hanno apportato i maggiori miglioramenti prestazionali. Dedicheremo inoltre attenzione ai modelli utilizzati e a come abbiamo combinato le loro predizioni per massimizzare la precisione finale. Successivamente, ci concentreremo su 4 domande di ricerca emerse al termine della Challenge: RQ1) Come cambiano le liste di raccomandazione quando codifichiamo parti testuali differenti degli articoli e le utilizziamo nei modelli di raccomandazione? RQ2) I risultati precedenti cambiano se consideriamo solo le raccomandazioni corrette? RQ3) I risultati sono influenzati dalla lunghezza della parte testuale codificata? RQ4) Come influisce la scelta della parte codificata sulle tendenze globali delle raccomandazioni? Per rispondere a queste domande, abbiamo confrontato le liste di raccomandazione generate da tre algoritmi alimentati con embedding di diverse parti degli articoli, ottenuti utilizzando due Language Models. Per RQ1, il nostro obiettivo era verificare che le raccomandazioni basate sulla codifica del titolo differissero da quelle basate sulla codifica contenuto; se confermato, ciò suggerirebbe la presenza di un fenomeno di clickbait. Confrontando i punteggi di correlazione tra le liste, abbiamo verificato che i modelli basati sui titoli producono raccomandazioni meno correlate con quelle basate su altre parti, confermando che la comprensione semantica dei modelli di linguaggio può essere utilizzata per rilevare il fenomeno del clickbait. Per RQ2, abbiamo ripetuto l’analisi considerando solo la distribuzione delle raccomandazioni corrette. Poiché i risultati erano comparabili a quelli del primo esperimento, abbiamo concluso che l’accuratezza dei modelli non compromette i risultati del punto precedente. Nel terzo esperimento, per RQ3, abbiamo verificato se vi fosse un comportamento lineare o monotono tra i punteggi di correlazione e la lunghezza delle parti codificate. L’esperimento ha mostrato che non vi era una relazione diretta, confermando che le differenze derivano esclusivamente dal divario semantico tra titolo e contenuto. Infine, per RQ4, abbiamo analizzato come le tendenze generali delle raccomandazioni variano in base alla parte codificata. È emerso che i sistemi di raccomandazione tendono alcuni articoli più spesso utilizzando la codifica del titolo rispetto a quella del contenuto, questi sono candidati ad esserre articoli clickbait.