The rapid evolution of viruses through mechanisms like recombination poses a significant challenge to global public health, as demonstrated by recent outbreaks of COVID-19 and monkeypox. While the vast amount of sequence data in public databases is a critical resource for genomic surveillance, its sheer scale makes the manual detection of recombination events infeasible. Furthermore, existing computational tools are often limited to viruses with well-defined lineage nomenclatures, creating a bottleneck for analyzing newly emerging or less-studied pathogens. This thesis presents OpenRecombinHunt, a novel, fully automated, and configurable pipeline designed to overcome these challenges. The pipeline integrates two state-of-the-art tools: HaploCoV for de novo lineage generation and mutation profiling, and RecombinHunt for data-driven recombination detection. This integration extends the power of recombination analysis to viruses without a pre-existing classification system. The entire workflow, from data acquisition and preprocessing to core analysis and results visualization via an interactive Streamlit dashboard, is designed for automated, periodic execution. The OpenRecombinHunt pipeline was successfully applied to seven distinct viral taxa: SARS-CoV-2, Influenza A (H5N1), Monkeypox, RSV-A, RSV-B, Yellow Fever, and Zika virus. The analysis revealed varying recombination rates across these pathogens, with the highest rate observed in Monkeypox (15.72%). For SARS-CoV-2, a comparison with previous recombination studies confirms that the pipeline is capable of reproducing established findings while also detecting additional events, underscoring its robustness and applicability to ongoing genomic surveillance. This work contributes a robust, scalable, and broadly applicable tool for the automated surveillance of viral recombination. By enabling analysis of viruses regardless of their classification status and presenting the results in an accessible web interface, OpenRecombinHunt provides a significant enhancement to our capabilities in monitoring viral evolution. This thesis presents OpenRecombinHunt, a novel, fully automated, and configurable pipeline designed to overcome these challenges. The pipeline integrates two state-of-the-art tools: HaploCoV for de novo lineage generation and mutation profiling, and RecombinHunt for data-driven recombination detection. This integration extends the power of recombination analysis to viruses without a pre-existing classification system. The entire workflow, from data acquisition and preprocessing to core analysis and results visualization via an interactive Streamlit dashboard, is designed for automated, periodic execution. The OpenRecombinHunt pipeline was successfully applied to seven distinct viral taxa: SARS-CoV-2, Influenza A (H5N1), Monkeypox, RSV-A, RSV-B, Yellow Fever, and Zika virus. The analysis revealed varying recombination rates across these pathogens, with the highest rate observed in Monkeypox (15.72%). For SARS-CoV-2, a comparison with previous recombination studies confirms that the pipeline is capable of reproducing established findings while also detecting additional events, underscoring its robustness and applicability to ongoing genomic surveillance. This work contributes a robust, scalable, and broadly applicable tool for the automated surveillance of viral recombination. By enabling analysis of viruses regardless of their classification status and presenting the results in an accessible web interface, OpenRecombinHunt provides a significant enhancement to our capabilities in monitoring viral evolution.
La rapida evoluzione dei virus attraverso meccanismi come la ricombinazione rappresenta una sfida significativa per la salute pubblica globale, come dimostrato dalle recenti epidemie di COVID-19 e vaiolo delle scimmie. Sebbene l'enorme quantità di dati di sequenziamento nei database pubblici costituisca una risorsa critica per la sorveglianza genomica, la sua vastità rende impraticabile il rilevamento manuale degli eventi di ricombinazione. Inoltre, gli strumenti computazionali esistenti sono spesso limitati a virus con classificazioni ben definite, creando un ostacolo all'analisi di patogeni emergenti o meno studiati. Questa tesi presenta OpenRecombinHunt, una pipeline innovativa, completamente automatizzata e configurabile, progettata per superare questi limiti. La pipeline integra due strumenti allo stato dell'arte: HaploCoV per la generazione di lignaggi virali de novo e la profilazione delle mutazioni, e RecombinHunt per il rilevamento data-driven della ricombinazione. Tale integrazione estende la potenza dell'analisi di ricombinazione a virus privi di un sistema di classificazione preesistente. L'intero flusso di lavoro, dall'acquisizione e pre-elaborazione dei dati all'analisi e alla visualizzazione dei risultati tramite una dashboard interattiva Streamlit, è progettato per un'esecuzione periodica e automatizzata. La pipeline OpenRecombinHunt è stata applicata con successo a sette distinti virus: SARS-CoV-2, Influenza A (H5N1), Mpox, RSV-A, RSV-B, Febbre Gialla e Zika. L'analisi ha rivelato tassi di ricombinazione variabili tra questi patogeni, con il tasso più elevato osservato in Mpox (15.72%). Per SARS-CoV-2, un confronto con studi precedenti sulla ricombinazione conferma che la pipeline è in grado di riprodurre i risultati consolidati, rilevando al contempo eventi aggiuntivi e sottolineandone la robustezza e l'applicabilità alla sorveglianza genomica continua. Questo lavoro propone uno strumento robusto, scalabile ed applicabile per la sorveglianza automatizzata della ricombinazione virale. Consentendo l'analisi di virus indipendentemente dalla loro classificazione e presentando i risultati in un'interfaccia Web accessibile, OpenRecombinHunt offre un significativo potenziamento delle capacità di monitoraggio dell'evoluzione virale.
OpenRecombinHunt: automatic detection of recombination from publicly available viral sequences
TOPCUOGLU, YAVUZ SAMET
2024/2025
Abstract
The rapid evolution of viruses through mechanisms like recombination poses a significant challenge to global public health, as demonstrated by recent outbreaks of COVID-19 and monkeypox. While the vast amount of sequence data in public databases is a critical resource for genomic surveillance, its sheer scale makes the manual detection of recombination events infeasible. Furthermore, existing computational tools are often limited to viruses with well-defined lineage nomenclatures, creating a bottleneck for analyzing newly emerging or less-studied pathogens. This thesis presents OpenRecombinHunt, a novel, fully automated, and configurable pipeline designed to overcome these challenges. The pipeline integrates two state-of-the-art tools: HaploCoV for de novo lineage generation and mutation profiling, and RecombinHunt for data-driven recombination detection. This integration extends the power of recombination analysis to viruses without a pre-existing classification system. The entire workflow, from data acquisition and preprocessing to core analysis and results visualization via an interactive Streamlit dashboard, is designed for automated, periodic execution. The OpenRecombinHunt pipeline was successfully applied to seven distinct viral taxa: SARS-CoV-2, Influenza A (H5N1), Monkeypox, RSV-A, RSV-B, Yellow Fever, and Zika virus. The analysis revealed varying recombination rates across these pathogens, with the highest rate observed in Monkeypox (15.72%). For SARS-CoV-2, a comparison with previous recombination studies confirms that the pipeline is capable of reproducing established findings while also detecting additional events, underscoring its robustness and applicability to ongoing genomic surveillance. This work contributes a robust, scalable, and broadly applicable tool for the automated surveillance of viral recombination. By enabling analysis of viruses regardless of their classification status and presenting the results in an accessible web interface, OpenRecombinHunt provides a significant enhancement to our capabilities in monitoring viral evolution. This thesis presents OpenRecombinHunt, a novel, fully automated, and configurable pipeline designed to overcome these challenges. The pipeline integrates two state-of-the-art tools: HaploCoV for de novo lineage generation and mutation profiling, and RecombinHunt for data-driven recombination detection. This integration extends the power of recombination analysis to viruses without a pre-existing classification system. The entire workflow, from data acquisition and preprocessing to core analysis and results visualization via an interactive Streamlit dashboard, is designed for automated, periodic execution. The OpenRecombinHunt pipeline was successfully applied to seven distinct viral taxa: SARS-CoV-2, Influenza A (H5N1), Monkeypox, RSV-A, RSV-B, Yellow Fever, and Zika virus. The analysis revealed varying recombination rates across these pathogens, with the highest rate observed in Monkeypox (15.72%). For SARS-CoV-2, a comparison with previous recombination studies confirms that the pipeline is capable of reproducing established findings while also detecting additional events, underscoring its robustness and applicability to ongoing genomic surveillance. This work contributes a robust, scalable, and broadly applicable tool for the automated surveillance of viral recombination. By enabling analysis of viruses regardless of their classification status and presenting the results in an accessible web interface, OpenRecombinHunt provides a significant enhancement to our capabilities in monitoring viral evolution.| File | Dimensione | Formato | |
|---|---|---|---|
|
2025_10_topcuoglu_thesis_01.pdf
non accessibile
Descrizione: THESIS
Dimensione
7.39 MB
Formato
Adobe PDF
|
7.39 MB | Adobe PDF | Visualizza/Apri |
|
2025_10_topcuoglu_executivesummary_02.pdf
non accessibile
Descrizione: EXECUTIVE SUMMARY
Dimensione
1.26 MB
Formato
Adobe PDF
|
1.26 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/243347