Over the last decade, the development of efficient monitoring tools for process control and optimization has assumed an increasingly important role, especially when spectroscopic techniques like Raman spectroscopy are involved. The goal of this project was to use Raman spectral data to build statistical regression models to online-monitor control relevant process parameters during the production of monoclonal antibodies in different operating modes (i.e. fed-batch and perfusion). The standard analysis workflow was decomposed and all the steps were singularly analysed, optimised, evaluated with multivariate calibration experiments performed on different datasets and finally critically discussed. The analysing steps included data preparation, data pre-treatment, data pre-processing, regression and ensemble learning. In the first step, the spectral data were analyzed with univariate statistical techniques and unsupervised multivariate tools to increase the understanding of the datasets and to identify clear trends or outliers. Afterwards, the pre-treatment and pre-processing investigation highlighted how important the correct hyperparameter selection is and, simultaneously, showed the difficulty of selecting the correct setting. In the analysis of the regression techniques, partial least-squares regression (PLSR) performed generally better than vector regression (SVR). Finally, the ensemble learning techniques proved not to be able to significantly improve the model performances. For each step, many tools were implemented, ranging from classical chemometric methods to more advanced ones. The comparisons of the techniques showed that in the most cases classical methods tended to perform better on the test datasets than more advanced ones. It is assumed that this discrepancy was due to the low variability within the different data sets and the small amount of data points. Within this work, a wide variety of techniques were implemented and an advanced spectral predictive modelling platform was developed. In next steps, the approach should be validated on bigger datasets to exploit the benefits of more advanced techniques towards maximised model accuracy and robustness and define defining recommendations for the technique selection.
Nell’ultimo decennio, lo sviluppo di tecniche di monitoraggio volte a controllare ed ottimizzare i processi di produzione ha assunto un ruolo sempre più importante nel settore biofarmaceutico, specialmente in rapporto all’utilizzo di tecniche spettroscopiche, come la spettroscopia Raman, come principale strumento di analisi. L’obiettivo di questo progetto era l’utilizzo di misurazioni rilevate con uno spettroscopio Raman per costruire modelli statistici capaci di monitorare e predire la produzione di anticorpi monoclonali in differenti culture cellulari (ovvero fed-batch e perfusione). Per fare ciò, il protocollo standard per l’analisi di dati basati su spettroscopie è stato decomposto nei passaggi fondamentali: preparazione dei dati, pre-treatment e pre-processing, calibrazione e ensemble learning. La preparazione dei dataset ha permesso di svolgere un’analisi preliminare dei dataset per individuare particolari tendenze. Successivamente, le tecniche di pre-treatment e pre-processing hanno sottolineato l’importanza dei parametri delle tecniche sulla regressione finale e quanto possa essere difficile ottimizzare l’intero flusso. Durante l’analisi delle tecniche per la costruzione di modelli, si è notato come PLSR abbia avuto in media prestazioni migliori rispetto a SVR, Infine, le tecniche di ensemble learning non hanno mostrato alcuna influenza sulle performance rispetto ad un modello calibrato usando SVR. Durante il progetto, diversi strumenti sono stati implementati per affrontare i passaggi sopra elencati. Questi spaziavano da tecniche usate storiatemene in chemometria a strumenti più avanzati propri del campo del machine learning. Le prestazioni delle tecniche classiche sono state in generale superiori a quelle delle tecniche più avanzate. La convinzione è che ciò sia dovuto ai dati utilizzati durante il progetto, originati da dataset di piccole dimensioni e contenenti pochi tipi di osservazioni diverse. Ciononostante, si è creata una libreria di strumenti che fungono da base per future applicazioni di data analisi su spettroscopia. La possibilità di validare questi metodi su dataset più ampi permetterebbe di provare le tecniche più avanzate e di sfruttarne a pieno le potenzialità. Inoltre, l’utilizzo di dataset contenenti misurazioni più eterogenee potrebbe portare a scoprire tendenze nei dati che i dataset originali non contenevano a causa della loro limitatezza.
Development of advanced modelling techniques for Raman based spectral data of bioprocesses
BONIOLO, FABIO
2017/2018
Abstract
Over the last decade, the development of efficient monitoring tools for process control and optimization has assumed an increasingly important role, especially when spectroscopic techniques like Raman spectroscopy are involved. The goal of this project was to use Raman spectral data to build statistical regression models to online-monitor control relevant process parameters during the production of monoclonal antibodies in different operating modes (i.e. fed-batch and perfusion). The standard analysis workflow was decomposed and all the steps were singularly analysed, optimised, evaluated with multivariate calibration experiments performed on different datasets and finally critically discussed. The analysing steps included data preparation, data pre-treatment, data pre-processing, regression and ensemble learning. In the first step, the spectral data were analyzed with univariate statistical techniques and unsupervised multivariate tools to increase the understanding of the datasets and to identify clear trends or outliers. Afterwards, the pre-treatment and pre-processing investigation highlighted how important the correct hyperparameter selection is and, simultaneously, showed the difficulty of selecting the correct setting. In the analysis of the regression techniques, partial least-squares regression (PLSR) performed generally better than vector regression (SVR). Finally, the ensemble learning techniques proved not to be able to significantly improve the model performances. For each step, many tools were implemented, ranging from classical chemometric methods to more advanced ones. The comparisons of the techniques showed that in the most cases classical methods tended to perform better on the test datasets than more advanced ones. It is assumed that this discrepancy was due to the low variability within the different data sets and the small amount of data points. Within this work, a wide variety of techniques were implemented and an advanced spectral predictive modelling platform was developed. In next steps, the approach should be validated on bigger datasets to exploit the benefits of more advanced techniques towards maximised model accuracy and robustness and define defining recommendations for the technique selection.File | Dimensione | Formato | |
---|---|---|---|
MT_BonioloFabio.pdf
non accessibile
Dimensione
2.94 MB
Formato
Adobe PDF
|
2.94 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/145202