Treating diseases has proven to be fundamental in improving the quality of life of the human species. The drug discovery process aims at finding appropriate molecular compounds to treat diseases. This process has typically been conducted with in vitro testing of compounds against a target protein. It requires labor, time, and chemical materials, incurring in high costs. The increase in computing power of the last decades has pushed researchers to adopt virtual screening pipelines, where large collections of compounds, known as ligands, are tested against proteins on computer models. Ligand-based models are fast but not particularly accurate, while structure-based models guarantee improved results quality by leveraging structural information of target proteins, but incurring larger costs. A widely applied method of the second type, molecular docking, requires large computing resources, only offered by high-performance computing infrastructure. However, HPC research has been facing increasing challenges in trying to speed up the computations due to difficulties in component miniaturization. These premises moved research interest in software optimizations, that enable faster screening and greater efficiency. In this thesis, an optimized virtual screening pipeline is devised to tackle efficiency concerns in drug discovery. An initial molecular docking step with few molecules guides a similarity-based selection of other ligands from a large molecular library. The pipeline automatically adjusts the number of selected ligands to respect a user-defined time budget allocated to a final molecular docking step. An additional speedup is achieved by reducing the protein locations where ligands are tested, also known as Anchor Points. The combined filtering on both ligands and Anchor Points introduces a trade-off between selection quality and computation time, but the pipeline proves to be always superior to a random baseline where ligands and Anchor Points are downsampled.
Il trattamento di malattie si è dimostrato cruciale nel migliorare la qualità di vita della specie umana. Il processo di scoperta di farmaci punta a trovare composti chimici adatti a curare malattie. Questo processo è tipicamente condotto tramite test di laboratorio in vitro. Ciò richiede sforzi, tempo, e materiali chimici, incorrendo in alti costi. L'aumento di potenza computazionale degli ultimi decenni ha spinto i ricercatori a usare pipeline di virtual screening, dove grandi raccolte di composti, chiamati ligandi, sono testati contro proteine attraverso modelli informatici. Mentre i modelli ligand-based sono veloci ma non particolarmente precisi, i modelli structure-based garantiscono risultati migliori sfruttando informazioni strutturali delle proteine sotto analisi, ma andando incontro a costi significativi. Un noto metodo del secondo tipo, chiamato molecular docking, richiede grandi risorse computazionali, offerte solo da infrastrutture per High Performance Computing. Tuttavia, la ricerca HPC si è dovuta scontrare con sfide sempre più ardue per velocizzare i calcoli, a causa del rallentamento nella miniaturizzazione di componenti elettronici. Queste premesse giustificano uno spostamento di interesse verso ottimizzazioni software, che permettono screening più rapido. In questa tesi è illustrata una pipeline per virtual screening, ottimizzata per affrontare i problemi di efficienza nella scoperta di farmaci. Una fase iniziale di molecular docking guida una successiva selezione attuata tramite metodi di similarità ligand-based, estraendo molecole da una grande libreria di ligandi. La pipeline regola automaticamente il numero di ligandi selezionati per rispettare i limiti di tempo che l'utente impone per una fase di molecular docking finale. Un ulteriore incremento prestazionale è dovuto alla riduzione di specifiche posizioni delle proteine, note come Anchor Point, dove vengono posti i ligandi nelle simulazioni di docking. Il filtraggio combinato di ligandi e Anchor Point introduce un compromesso tra la qualità della selezione e il tempo di docking, ma la pipeline dimostra di essere sempre superiore a un approccio casuale.
Combining ligand-based and structure-based virtual screening approaches for efficient drug discovery experiments
Rizzo, Simone
2022/2023
Abstract
Treating diseases has proven to be fundamental in improving the quality of life of the human species. The drug discovery process aims at finding appropriate molecular compounds to treat diseases. This process has typically been conducted with in vitro testing of compounds against a target protein. It requires labor, time, and chemical materials, incurring in high costs. The increase in computing power of the last decades has pushed researchers to adopt virtual screening pipelines, where large collections of compounds, known as ligands, are tested against proteins on computer models. Ligand-based models are fast but not particularly accurate, while structure-based models guarantee improved results quality by leveraging structural information of target proteins, but incurring larger costs. A widely applied method of the second type, molecular docking, requires large computing resources, only offered by high-performance computing infrastructure. However, HPC research has been facing increasing challenges in trying to speed up the computations due to difficulties in component miniaturization. These premises moved research interest in software optimizations, that enable faster screening and greater efficiency. In this thesis, an optimized virtual screening pipeline is devised to tackle efficiency concerns in drug discovery. An initial molecular docking step with few molecules guides a similarity-based selection of other ligands from a large molecular library. The pipeline automatically adjusts the number of selected ligands to respect a user-defined time budget allocated to a final molecular docking step. An additional speedup is achieved by reducing the protein locations where ligands are tested, also known as Anchor Points. The combined filtering on both ligands and Anchor Points introduces a trade-off between selection quality and computation time, but the pipeline proves to be always superior to a random baseline where ligands and Anchor Points are downsampled.File | Dimensione | Formato | |
---|---|---|---|
2024_04_Rizzo_Executive Summary_02.pdf
Open Access dal 20/03/2025
Descrizione: Executive Summary
Dimensione
454.53 kB
Formato
Adobe PDF
|
454.53 kB | Adobe PDF | Visualizza/Apri |
2024_04_Rizzo_Tesi_01.pdf
Open Access dal 20/03/2025
Descrizione: Testo tesi
Dimensione
2.43 MB
Formato
Adobe PDF
|
2.43 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/219779