Query intent mining and classification by embedding intent graphs

With the growing importance of search engines, users express their information needs by means of textual queries. These short text queries are usually composed by few, often ambiguous keywords. Thus, understanding the intent (i.e. what the user really means) behind a query has become important for displaying structured precise answers in search engines as well as for the business value that comes with it. One of the possible approaches to detect user intent is to classify the user textual queries according to certain intent categories. The goal of this work is to develop a supervised learning approach for query classification into intent categories. The aim is overcoming the lack of information and the ambiguity within the queries by applying Natural Language Processing techniques for enriching the data from a syntactic and semantic point of view, and uses this additional information in the downstream classification task, combing the textual query representation with a graph-based approach that leverages on the expressive power given by the intent graph's data structure. Being the intent category definition one of the key aspects in a classification task, we also defined an approach that uses the enriched data to abstract query terms in intent categories, with the goal of extracting significant categories directly from data. The experimental results, conducted on a real dataset of customer’s queries, show that the use of intent graph's data structure, combined with query text, leads to a better accuracy in the prediction task.

Con l’importanza sempre maggiore assunta dai motori di ricerca, gli utenti esprimono il loro bisogno di informazioni attraverso query testuali. Si tratta di testi brevi, composti da parole chiave, che solitamente sono abbastanza ambigui. Pertanto, capire l’intento dietro una query è diventato importante sia per mostrare risposte precise nei motori di ricerca, che per il valore di business contenuto nell’intento. Uno dei possibili approcci per identificare l’intento dell’utente è quello di classificare queste query testuali in categorie d’intento. L’obiettivo di questo lavoro è quello di sviluppare un approccio di supervised learning per classificare le query in categorie d’intento. Lo scopo è quello di superare la mancanza di informazioni e l’ambiguità all’interno delle query, applicando tecniche di Natural Language Processing per arricchire i dati da un punto di vista sintattico e semantico, e usare queste informazioni addizionali nel successivo processo di classificazione, combinando la rappresentazione testuale delle query con un approccio basato su un grafo, che fa leva sulla potenza espressiva derivante dalle strutture dati a grafi di intento. Essendo la definizione delle categorie uno degli aspetti chiave in un processo di classificazione, abbiamo definito anche un approccio che usa i dati arricchiti per astrarre i termini delle query in categorie d’intento, con l’obiettivo di estrarre categorie significative direttamente dai dati. I risultati sperimentali, condotti su un dataset reale di query di clienti, mostrano che l’utilizzo di strutture dati a grafi d’intento, combinato con il testo della query, porta ad una migliore accuratezza del modello nel compito di predizione.