Hyperparameter optimization under runtime constraints

This thesis proposes a budget-aware extension to an existing hyperparameter tuning pipeline for CoCoAFusE, a computationally expensive Bayesian Mixture-of-Experts model trained via MCMC. The methodology is based on two steps. First, a predictive runtime model is fit from historical tuning logs. Second, the fitted model is translated into a probabilistic feasibility rule under a user-defined per-trial budget B and risk level τ, and this rule is enforced during sampling by restricting variables (e.g., iterations) to a feasible range. The framework is evaluated on two dynamical benchmarks: Cascaded Tanks and Pick-and-Place. For Cascaded Tanks, the runtime model achieves predictive accuracy (R² = 0.815 train, 0.861 test) and the feasibility filter yields high acceptance rates (≈ 0.89–0.91) with no observed budget violations under B = 1 hour for τ ∈ {0.05, 0.1}. For the more computationally demanding Pick-and-Place benchmark, runtime prediction is less accurate for the longest trials (R² = 0.842 train, 0.714 test), and under B = 8 hours and τ = 0.05 the acceptance rate is lower (0.848). In this setting, only a small fraction of trials exceeded the budget (0.015) and the exceedances were small. A comparison of predictive performance across tuning regimes (no tuning, unrestricted tuning, and budget-constrained tuning) shows that the impact of feasibility filtering depends on how restrictive the budget is and on the quality of runtime prediction. Overall, the results support budget-aware feasibility as a practical way to trade off speed of results and model quality, and to make tuning throughput more predictable for planning timelines and deliverables, without requiring changes to the underlying model complexity.

Questa tesi propone un’estensione budget-aware a una pipeline esistente di hyperparameter tuning per CoCoAFusE, un modello bayesiano Mixture-of-Experts computazionalmente costoso, addestrato tramite MCMC. La metodologia si basa su due passaggi. Primo, si stima un modello predittivo del tempo di esecuzione a partire dai log storici di tuning. Secondo, il modello stimato viene tradotto in una regola di fattibilità probabilistica, definita rispetto a un budget per trial B e a un livello di rischio τ scelti dall’utente; tale regola viene poi applicata durante il campionamento, limitando alcune variabili (ad es. il numero di iterazioni) a un intervallo fattibile. Il framework viene valutato su due benchmark dinamici: Cascaded Tanks e Pick-and-Place. Per Cascaded Tanks, il modello di runtime raggiunge una buona accuratezza predittiva (R² = 0.815 in training, 0.861 in test) e il filtro di fattibilità produce alti tassi di accettazione (≈ 0.89–0.91), senza violazioni osservate del budget, con B = 1 ora e τ ∈ {0.05, 0.1}. Per il benchmark più oneroso Pick-and-Place, la predizione del runtime è meno accurata per i trial più lunghi (R² = 0.842 in training, 0.714 in test) e, con B = 8 ore e τ = 0.05, il tasso di accettazione è più basso (0.848). In questo caso, solo una piccola frazione di trial ha superato il budget (0.015) e gli sforamenti sono stati contenuti. Il confronto delle prestazioni predittive tra diversi regimi di tuning (nessun tuning, tuning senza vincoli e tuning con vincolo di budget) mostra che l’impatto del filtro di fattibilità dipende da quanto il budget sia restrittivo e dalla qualità della predizione del runtime. Nel complesso, i risultati supportano la fattibilità budget-aware come un modo pratico per bilanciare rapidità dei risultati e qualità del modello, e per rendere più prevedibile il throughput del tuning ai fini della pianificazione di tempistiche e deliverable, senza richiedere modifiche alla complessità del modello sottostante.