In recent years, the interest of the scientific community towards computational approaches to biology has grown exponentially, in particular, a growing field where machine learning techniques are receiving special attention is pharmacology. The recent COVID-19 pandemic has highlighted the need to speed up analysis and drug development against new diseases; a crucial step in this framework is drug-target interaction prediction. This task is necessary to assess whether a newly developed drug (or an older one) can effectively interact with a protein, bringing physiological effects which are supposed to cure a disease. For many years, predicting drug-target interaction was mainly done by expensive and time-consuming experiments. Recently, many annotated datasets in the biological domain have been created, opening the lead to machine learning techniques. In this thesis we are going to build and test a model capable of predicting whether a drug is interacting with a given protein, exploiting Natural Language Processing tools. We focused our work on protein and drug plain sequence description, so without leveraging any other domain-specific features. We trained and test a BERT-based model and reached results comparable with techniques that extensively exploit external biological information.
Negli ultimi anni, l'interesse della comunità scientifica verso gli approcci computazionali alla biologia è cresciuto in modo esponenziale, in particolare un campo in crescita in cui le tecniche di apprendimento automatico stanno ricevendo particolare attenzione è la farmacologia. La recente pandemia COVID-19 ha messo in evidenza la necessità di accelerare l'analisi e lo sviluppo di farmaci contro nuove malattie; un passo cruciale in questo contesto è la previsione dell'interazione tra farmaco e proteina. Questo compito è necessario per valutare se un farmaco sviluppato può interagire efficacemente con una proteina, producendo effetti fisiologici che dovrebbero curare una malattia. Per molti anni, questo processo è stato effettuato principalmente mediante esperimenti costosi e dispendiosi in termini di tempo, ma recentemente, sono nati numerosi dataset annotati in ambito biologico, aprendo la strada alle tecniche di apprendimento automatico. In questa tesi si intende costruire e testare un modello in grado di prevedere se un farmaco interagisce con una determinata proteina, sfruttando gli strumenti di Natural Language Processing. Il nostro lavoro si è concentrato sull'analisi di sequenze di proteine e farmaci, quindi senza sfruttare altre specifiche caratteristiche, abbiamo addestrato e testato un modello basato su BERT e abbiamo raggiunto risultati paragonabili a quelli di tecniche che sfruttano in maniera estensiva informazioni biologiche.
Language models for Drug-Target interaction prediction
BRUNELLO, NICOLÒ
2021/2022
Abstract
In recent years, the interest of the scientific community towards computational approaches to biology has grown exponentially, in particular, a growing field where machine learning techniques are receiving special attention is pharmacology. The recent COVID-19 pandemic has highlighted the need to speed up analysis and drug development against new diseases; a crucial step in this framework is drug-target interaction prediction. This task is necessary to assess whether a newly developed drug (or an older one) can effectively interact with a protein, bringing physiological effects which are supposed to cure a disease. For many years, predicting drug-target interaction was mainly done by expensive and time-consuming experiments. Recently, many annotated datasets in the biological domain have been created, opening the lead to machine learning techniques. In this thesis we are going to build and test a model capable of predicting whether a drug is interacting with a given protein, exploiting Natural Language Processing tools. We focused our work on protein and drug plain sequence description, so without leveraging any other domain-specific features. We trained and test a BERT-based model and reached results comparable with techniques that extensively exploit external biological information.File | Dimensione | Formato | |
---|---|---|---|
Language_models_for_Drug_Target_interaction_prediction.pdf
accessibile in internet per tutti
Dimensione
4.86 MB
Formato
Adobe PDF
|
4.86 MB | Adobe PDF | Visualizza/Apri |
Executive_Summary_Language_models_for_Drug_Target_interaction_prediction.pdf
accessibile in internet per tutti
Dimensione
374.46 kB
Formato
Adobe PDF
|
374.46 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/188864