The discovery of new drugs is a long and costly process, characterized by a high failure rate. Virtual screening has shown the ability to accelerate the early stages of research by rapidly exploring virtual chemical libraries, thereby reducing time and costs. Within this paradigm lies fragment-based design, which leverages small molecular fragments as building blocks to construct new ligands. However, a fundamental challenge persists: many molecules generated in silico cannot be reproduced in the laboratory. This thesis introduces FLIP-4-DD (Fragment-Level docking, Interaction Preservation, and Pareto filtering for Drug Design), an iterative and scalable protocol that extends the CReM-dock framework with four main innovations: (i) implementation on high-performance computing infrastructures through Nextflow and HyperQueue, enabling parallel ligand generation, docking, and analysis; (ii) the use of the ProLIF library to identify and protect atoms involved in hydrogen bonds during generation, thus preserving essential interactions; (iii) a multi-objective selection strategy based on the Pareto front, capable of balancing docking affinity and molecular weight; (iv) an automatic stopping criterion based on hypervolume improvement, which terminates iterations when progress becomes marginal. Experimental validation demonstrates that FLIP-4-DD generates ligands that are competitive with known drugs and superior to those produced by the reference CReM-dock pipeline. In particular, the results highlight the ability to obtain lighter molecules with comparable or higher predicted affinity and improved preservation of hydrogen bonds. Overall, these findings show how the combination of fragment-based generation, interaction preservation, and multi-objective optimization, integrated into an HPC workflow, can accelerate de novo drug design by producing more realistic candidates.
La scoperta di nuovi farmaci è un processo lungo e costoso, caratterizzato da un elevato tasso di fallimento. Il virtual screening ha mostrato la capacità di accelerare le fasi iniziali della ricerca, consentendo di esplorare rapidamente librerie chimiche virtuali e riducendo tempi e costi. All’interno di questo paradigma si colloca il fragment-based design, che sfrutta piccoli frammenti molecolari come mattoni per costruire nuovi ligandi. Tuttavia, persiste una sfida fondamentale: molte molecole generate in silico risultano non essere riproducibili in laboratorio. Questa tesi introduce FLIP-4-DD (Fragment-Level docking, Interaction Preservation, and Pareto filtering for Drug Design), un protocollo iterativo e scalabile che estende il framework CReM-dock con quattro principali innovazioni: (i) l’implementazione su infrastrutture di calcolo ad alte prestazioni tramite Nextflow e HyperQueue, che consente la generazione, il docking e l’analisi parallela dei ligandi; (ii) l’impiego della libreria ProLIF per identificare e proteggere gli atomi coinvolti in legami idrogeno durante la generazione, preservando così le interazioni essenziali; (iii) una strategia di selezione multi-obiettivo basata sulla frontiera di Pareto, in grado di bilanciare affinità di docking e peso molecolare; (iv) un criterio di arresto automatico fondato sul miglioramento dell’hypervolume, che interrompe le iterazioni quando i progressi diventano marginali. La validazione sperimentale dimostra che FLIP-4-DD genera ligandi competitivi con farmaci noti e superiori a quelli prodotti dalla pipeline di riferimento CReM-dock. In particolare, i risultati evidenziano la capacità di ottenere molecole più leggere, con affinità predetta comparabile o superiore e una migliore conservazione dei legami idrogeno. Nel complesso, questi risultati mostrano come la combinazione di generazione basata su frammenti, preservazione delle interazioni e ottimizzazione multi-obiettivo, integrata in un workflow HPC, possa accelerare il drug design de novo, producendo candidati più realistici.
FLIP-4-DD: scalable fragment-based design for drug discovery through interactions preservation and pareto selection for HPC
De Simone, Anna
2024/2025
Abstract
The discovery of new drugs is a long and costly process, characterized by a high failure rate. Virtual screening has shown the ability to accelerate the early stages of research by rapidly exploring virtual chemical libraries, thereby reducing time and costs. Within this paradigm lies fragment-based design, which leverages small molecular fragments as building blocks to construct new ligands. However, a fundamental challenge persists: many molecules generated in silico cannot be reproduced in the laboratory. This thesis introduces FLIP-4-DD (Fragment-Level docking, Interaction Preservation, and Pareto filtering for Drug Design), an iterative and scalable protocol that extends the CReM-dock framework with four main innovations: (i) implementation on high-performance computing infrastructures through Nextflow and HyperQueue, enabling parallel ligand generation, docking, and analysis; (ii) the use of the ProLIF library to identify and protect atoms involved in hydrogen bonds during generation, thus preserving essential interactions; (iii) a multi-objective selection strategy based on the Pareto front, capable of balancing docking affinity and molecular weight; (iv) an automatic stopping criterion based on hypervolume improvement, which terminates iterations when progress becomes marginal. Experimental validation demonstrates that FLIP-4-DD generates ligands that are competitive with known drugs and superior to those produced by the reference CReM-dock pipeline. In particular, the results highlight the ability to obtain lighter molecules with comparable or higher predicted affinity and improved preservation of hydrogen bonds. Overall, these findings show how the combination of fragment-based generation, interaction preservation, and multi-objective optimization, integrated into an HPC workflow, can accelerate de novo drug design by producing more realistic candidates.| File | Dimensione | Formato | |
|---|---|---|---|
|
2025_10_DeSimone_01.pdf
solo utenti autorizzati a partire dal 28/09/2028
Descrizione: Thesis
Dimensione
7.98 MB
Formato
Adobe PDF
|
7.98 MB | Adobe PDF | Visualizza/Apri |
|
2025_10_DeSimone_02.pdf
solo utenti autorizzati a partire dal 28/09/2028
Descrizione: Executive Summary
Dimensione
2.33 MB
Formato
Adobe PDF
|
2.33 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/243481