A Framework to Accelerate the Adaptation of Machine Learning Models

In recent years, practical industrial applications have increased their use of Machine Learning (ML) models to support decision-making processes. When an ML model is deployed in a real-world production setting, it must deal with the common issue of data non-stationarity, which affects learner's performances and necessitates continuous adaptation to the dynamic characteristics of the environment. The growing interest in this field has paved the way for the development of monitoring and adaptation platforms to automatically manage the lifecycle of ML models. However, it is common that, particularly in adaptation modules, little emphasis is placed on the quality of retraining techniques, owing to the difficulty of determining whether, when a concept drift is detected, it is worthwhile to retrain the monitored model or accept a retraining suggestion provided by these platforms. In this thesis, we focus on classification problems in an online, non-stationary setting and present a learner-agnostic framework for dealing with such problems in the context of importance weighting-related retraining techniques. It consists of a support tool that can help understand when to accept such suggestions and an active learning framework that accelerates reaching the optimal time for retraining. We tested our framework with synthetic data, showing that it can help achieve a better retraining quality under non-stationary conditions.

Negli ultimi anni, le applicazioni industriali pratiche hanno aumentato l'uso dei modelli di Machine Learning (ML) per supportare i processi decisionali. Quando un modello di ML viene implementato in un contesto di produzione del mondo reale, deve affrontare il comune problema della non stazionarietà dei dati, che influisce sulle prestazioni del modello e richiede un'adattamento continuo alle caratteristiche dinamiche dell'ambiente. L'interesse crescente in questo campo ha aperto la strada allo sviluppo di piattaforme di monitoraggio e adattamento per gestire automaticamente il ciclo di vita dei modelli di ML. Tuttavia, è comune che, in particolare nei moduli di adattamento, si dia poca importanza alla qualità delle tecniche di riaddestramento, a causa della difficoltà nel determinare se, quando viene rilevato uno cambiamento nella struttura dei dati, sia utile riaddestrare il modello monitorato o accettare suggerimenti di riaddestramento forniti da queste piattaforme. In questa tesi, ci concentriamo sulla classificazione dei problemi in un contesto online e non stazionario e presentiamo un framework indipendente dal modello per affrontare tali problemi nel contesto delle tecniche di riaddestramento correlate all'importanza del dato. Esso è costituito da uno strumento di supporto che può aiutare a capire quando accettare tali suggerimenti e un framework di apprendimento attivo (active learning) che accelera il raggiungimento del momento ottimale per il riaddestramento. Abbiamo testato il nostro framework con dati sintetici, dimostrando che può contribuire a ottenere una migliore qualità di riaddestramento in condizioni non stazionarie.