Automatic derivative-free optimization of chemical process simulations using surrogate models

The recent advances in the field of machine learning (ML) indicate the importance of implementing this powerful set of mathematical tools to engineering. One of the tasks in which ML can be employed is the optimization of chemical processes. This work is built upon Tiresias, the software developed by the SuPER Team of Politecnico di Milano, for the automatic training of surrogate models of chemical processes through Aspen HYSYS simulations or real plant data from sensors. Tiresias includes all the steps to build surrogate models of the chemical processes: the Design of experiments (DOE), the pre-processing of data, the training and the selection of the bests model through a metric calculated from cross-validated data. The main contributions of this work are the implementation of a space-filling sequential DOE strategy, after having understood that adaptive sampling based on exploitation is not feasible, and the introduction of an optimization framework through a derivative-free optimization algorithm, MADS, provided by the free software NOMAD, which addresses one of the main limitations of Tiresias in its current form: it did not have features that allowed the application of the trained surrogate models to real-world cases. Now the possibility of optimizing the cost of the utilities of a process contributes to the future versions of Tiresias being closer to an industrial application. Other improvements to Tiresias have been added, such as the ability to test the models on unseen data. The results on four case studies indicate, from the side of sequential sampling, that this strategy does not bring any benefit in terms of computational savings, which should have been one of its advantages, and tends to overfit the models. The optimizer instead works as intended as long as the models are accurate, which confirms the potential of this tool for the optimization of processes even when the Optimizer of Aspen HYSYS does not converge easily, with the advantage of being able to optimize for the mass fractions of the streams, while the Optimizer of HYSYS does not allow it, and being easy to use.

I recenti progressi nel campo del machine learning (ML) evidenziano l'importanza di implementare questi potenti strumenti matematici all'ingegneria. Una delle attività in cui il ML può essere impiegato è l'ottimizzazione dei processi chimici. Questo lavoro di tesi si basa su Tiresias, il software sviluppato dal SuPER Team del Politecnico di Milano per il training automatico di modelli surrogati di processi chimici attraverso simulazioni di Aspen HYSYS o dati reali provenienti da sensori. Tiresias include tutti i gli step per costruire modelli surrogati dei processi chimici: il Design of experiments (DOE), il pre-processing dei dati, il training e la selezione dei migliori modelli attraverso una metrica calcolata dalla cross-validation. I principali contributi di questa tesi sono l'implementazione di una strategia sequenziale di DOE space-filling, dopo aver compreso l'impossibilità di implementare il sampling adattivo basato sull' exploitation, e l'introduzione di un framework di ottimizzazione attraverso un algoritmo di derivative-free optimization, MADS, fornito dal software libero NOMAD, che rimedia ad una delle principali limitazioni di Tiresias nella sua forma attuale: l'assenza di funzionalità che permettessero l'applicazione dei modelli surrogati a casi reali. Ora la possibilità di ottimizzare i costi operativi di un processo contribuisce a rendere le future versioni di Tiresias più vicine ad un'applicazione industriale. A Tiresias sono state introdotte anche altre funzionalità, come la capacità di testare i modelli su dati non presenti nel training set. I risultati su quattro casi studio indicano che, dal lato del sampling sequenziale, questa strategia non apporta alcun beneficio in termini di risparmio computazionale, che avrebbe dovuto essere uno dei suoi vantaggi, e tende a favorire l'overfitting dei modelli. L'ottimizzatore, invece, funziona come previsto, se i modelli sono accurati, il che conferma il potenziale di questo strumento per l'ottimizzazione anche quando l'Optimizer di Aspen HYSYS non converge facilmente, con il vantaggio di poter ottimizzare le frazioni massiche dei flussi, mentre HYSYS non lo consente, e di essere facile da usare.