This thesis explores the application of Natural Language Processing (NLP) techniques for Drug Target Interaction (DTI) prediction, focusing on the use of a Transformer-based model, specifically the Text-to-Text Transfer Transformer (T5), pre-trained on protein sequences in FASTA format. The novel approach involves further training the T5 model on Simplified Molecular-Input Line-Entry System (SMILES) representations of chemical compounds, followed by fine-tuning it to predict the negative logarithm of the dissociation constant (-logKd) or inhibition constant (Ki) for drug-protein interactions. This research aims to investigate whether incorporating information about the protein’s pocket or bind ing site can enhance the model’s performance in DTI prediction. To evaluate the efficacy of this method, it was compared the fine-tuned T5 model’s scoring, ranking, and forward screening powers against established scoring functions within the Critical Assessment of Scoring Functions (CASF) framework. The thesis also discusses the implications of these results for the field of drug discovery, particularly in terms of the potential for increased throughput in computational screening processes.
Questo elaborato esplora l’applicazione delle tecniche di Elaborazione del Linguaggio Nat urale (NLP) per la predizione dell’Interazione Drug-Target (DTI), concentrandosi sull’uso di un modello basato su Transformer, specificamente il Text-to-Text Transfer Transformer (T5), pre-addestrato su sequenze proteiche in formato FASTA. L’approccio innovativo prevede un ulteriore addestramento del modello T5 sulle rappresentazioni dei composti chimici in Simplified Molecular-Input Line-Entry System (SMILES), seguito da un fine tuning per prevedere il logaritmo negativo della costante di dissociazione (-logKd) o della costante di inibizione (Ki) per le interazioni Drug-Target. Questa ricerca mira a indagare se l’incorporazione di informazioni riguardanti la tasca o il sito di legame della proteina possa migliorare le prestazioni del modello nella predizione di DTI. Per valutare l’efficacia di questo metodo, sono state confrontate le capacità di scoring, ranking e screening del modello T5 fine-tuned contro le scorning function stabilite nel framework del Critical Assessment of Scoring Functions (CASF). La tesi discute anche le implicazioni di questi risultati per il campo della scoperta dei farmaci, in particolare in termini di throughput per un aumento della produttività nei processi di screening computazionale.
Enhancing drug-target interaction prediction in drug discovery using LLMs by integrating pocket information
Di TORO, LORENZO
2022/2023
Abstract
This thesis explores the application of Natural Language Processing (NLP) techniques for Drug Target Interaction (DTI) prediction, focusing on the use of a Transformer-based model, specifically the Text-to-Text Transfer Transformer (T5), pre-trained on protein sequences in FASTA format. The novel approach involves further training the T5 model on Simplified Molecular-Input Line-Entry System (SMILES) representations of chemical compounds, followed by fine-tuning it to predict the negative logarithm of the dissociation constant (-logKd) or inhibition constant (Ki) for drug-protein interactions. This research aims to investigate whether incorporating information about the protein’s pocket or bind ing site can enhance the model’s performance in DTI prediction. To evaluate the efficacy of this method, it was compared the fine-tuned T5 model’s scoring, ranking, and forward screening powers against established scoring functions within the Critical Assessment of Scoring Functions (CASF) framework. The thesis also discusses the implications of these results for the field of drug discovery, particularly in terms of the potential for increased throughput in computational screening processes.File | Dimensione | Formato | |
---|---|---|---|
2024_04_Di_Toro_Tesi.pdf
accessibile in internet solo dagli utenti autorizzati
Descrizione: Thesis
Dimensione
1.65 MB
Formato
Adobe PDF
|
1.65 MB | Adobe PDF | Visualizza/Apri |
2024_04_Di_Toro_Summary.pdf
accessibile in internet solo dagli utenti autorizzati
Descrizione: Executive Summary
Dimensione
442.21 kB
Formato
Adobe PDF
|
442.21 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/218361