Benchmarking graph neural network architectures for drug response prediction

The aim of this thesis work is to explore how personalized medicine could benefit from the use of advanced machine learning techniques that are becoming popular in the recent years. In particular, we focus on the problem of antitumoral drug response prediction, which is a key step in the development of personalized medicine. The goal of drug response prediction is to predict the response of a cancer patient to a drug, given the genomic information of the patient and the chemical structure of the drug. Many methods have been proposed in the literature to solve this problem, most of these methods represent the drugs as strings of characters, which makes it difficult to exploit the information contained in the chemical structure of the drugs. Only in recent years, some methods have been proposed to work directly on the molecular graph of the drugs. Moreover, the methods proposed in the literature often encode the cell lines using a simple convolutional operation, which only partially captures the complexity and structure of the cell lines. In this thesis, we benchmark different Graph Neural Network architectures for embedding the drugs in the context of anticancer drug response prediction. Moreover, we propose a new method that introduces the attention mechanism in the encoding of the gene expression of the cell lines, improving the performance of the state-of-the-art methods for this task by 12.72\% in terms of root mean square error (RMSE), and 2\% in terms of Pearson correlation coefficient (PCC).

L'obiettivo di questo lavoro di tesi è quello di esplorare come la medicina personalizzata possa beneficiare dall'uso di tecniche avanzate di apprendimento automatico che si stanno diffondendo negli ultimi anni. In particolare, ci concentriamo sul problema della previsione della risposta ai farmaci antitumorali, che rappresenta un passo fondamentale nello sviluppo della medicina personalizzata. L'obiettivo della previsione della risposta ai farmaci è quello di prevedere la risposta di un paziente oncologico a un farmaco, date le informazioni genomiche del paziente e la struttura chimica del farmaco. In letteratura sono stati proposti molti metodi per risolvere questo problema; la maggior parte di questi metodi rappresenta i farmaci come stringhe di caratteri, il che rende difficile sfruttare le informazioni contenute nella struttura chimica dei farmaci. Solo negli ultimi anni sono stati proposti alcuni metodi che lavorano direttamente sul grafo molecolare dei farmaci. Inoltre, i metodi proposti in letteratura spesso codificano le linee cellulari utilizzando una semplice operazione convoluzionale, che cattura solo parzialmente la complessità e la struttura delle linee cellulari. In questa tesi, confrontiamo diverse architetture di reti neurali grafiche per l'incorporazione dei farmaci nel contesto della previsione della risposta ai farmaci antitumorali. Inoltre, proponiamo un nuovo metodo che introduce il meccanismo di attenzione nella codifica dell'espressione genica delle linee cellulari, migliorando le prestazioni dei metodi all'avanguardia per questo compito del 12,72\% in termini di errore quadratico medio (RMSE) e del 2\% in termini di coefficiente di correlazione di Pearson (CCP).