The study presented in this thesis is about generating text that encapsulates the function of proteins solely from their amino acid sequences. Proteins, essential biomolecules with diverse functions in living organisms, are encoded by sequences of amino acids. Understanding their functions from sequences alone has been a challenge in bioinformatics and molecular biology. This thesis addresses this challenge by fine-tuning a pre-trained FLAN-T5 model to generate textual descriptions of protein functions. It contributes to the growing body of knowledge at the intersection of bioinformatics, natural language processing, and machine learning. It offers an approach to decode the genetic information embedded in protein sequences.
Lo studio presentato in questa tesi riguarda la generazione di testo che esprime la funzione delle proteine esclusivamente dalle loro sequenze di aminoacidi. Le proteine, biomolecole essenziali con diverse funzioni negli organismi viventi, sono codificate da sequenze di aminoacidi. Comprendere le loro funzioni solo dalle sequenze è stata una sfida nella bioinformatica e nella biologia molecolare. Questa tesi affronta questa sfida attraverso il fine-tuning di un modello FLAN-T5 pre-trainato per le funzioni delle proteine. Contribuisce al crescente corpus di conoscenze all’incrocio tra bioinformatica, elaborazione del linguaggio naturale e machine learning. Offre un approccio per decodificare le informazioni genetiche nelle sequenze proteiche
Protein function prediction via protein sequence: FLANT5-Deriven Explainable Approach
FALTAOUS, ABANOUB
2023/2024
Abstract
The study presented in this thesis is about generating text that encapsulates the function of proteins solely from their amino acid sequences. Proteins, essential biomolecules with diverse functions in living organisms, are encoded by sequences of amino acids. Understanding their functions from sequences alone has been a challenge in bioinformatics and molecular biology. This thesis addresses this challenge by fine-tuning a pre-trained FLAN-T5 model to generate textual descriptions of protein functions. It contributes to the growing body of knowledge at the intersection of bioinformatics, natural language processing, and machine learning. It offers an approach to decode the genetic information embedded in protein sequences.| File | Dimensione | Formato | |
|---|---|---|---|
|
2024_7_Faltaous_Tesi_01.pdf
accessibile in internet per tutti
Dimensione
3.37 MB
Formato
Adobe PDF
|
3.37 MB | Adobe PDF | Visualizza/Apri |
|
2024_7_Faltaous_Executive Summary_02.pdf
accessibile in internet per tutti
Dimensione
302.17 kB
Formato
Adobe PDF
|
302.17 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/223092