A stochastic approach for scheduling AI training jobs in GPU-based systems

Deep Learning methods are currently used to address a variety of complex tasks. This is partially motivated by the fact that these models are now trained on GPUs, expanding the range of problems that can be solved in a reasonable computating time. This has caused the demand for high-performance GPU-based cloud servers to increase dramatically, making it necessary for Cloud Service Providers (CSPs) to manage that demand effectively. In this thesis, we optimize the scheduling of Deep Learning training jobs from the perspective of a CSP running a data center, efficiently selecting resources for the execution of each job in order to minimize average power consumption. We modeled this problem through a Mixed Integer Linear Programming (MILP) formulation, and we developed a stochastic heuristic that, exploiting the probability distribution of early termination, determines how to vary the resource assignment during the execution of each job to minimize the expected value of the energy cost while still fulfilling deadlines. We set up an extensive experimental campaign to perform simulations and test the quality of the solution identified by our method. The results show that our heuristic guarantees significantly better results than other methods in the literature, with a percentage energy cost reduction of about 38-40% on average. We also prove the applicability of our method in real-world situations, as obtaining optimal schedules for systems of up to 100 nodes and 400 concurrent jobs requires less than 60 seconds. Finally we evaluated the effectiveness of GPU sharing, that is, running multiple jobs in a single GPU. The results demonstrate that, depending on the workload and GPU memory, this possibility can reduce the percentage cost by 17-29% on average.

Attualmente, un'ampia gamma di problemi complessi viene affrontata con tecniche di Deep Learning (DL). Il motivo è, in parte, l'impiego delle GPU in questo contesto, che ha ampliato la quantità di applicazioni di DL risolvibili in un tempo computazionale ragionevole. La domanda di server cloud ad alte prestazioni basati sulle GPU è aumentata drasticamente, rendendo necessaria per i provider di servizi cloud (CSP) una gestione efficace delle risorse. In questa tesi, ottimizziamo lo scheduling dei job di Deep Learning dal punto di vista di un CSP che gestisce un data center, selezionando in modo efficiente le risorse per l'esecuzione di ciascun job al fine di minimizzare il consumo medio di energia. Abbiamo modellato questo problema con una formulazione PLIM (Programmazione Lineare Intera Mista) e abbiamo sviluppato un'euristica stocastica che, sfruttando la probabilità di fine training anticipato, decide come modificare le risorse assegnate ad un job durante l'esecuzione in modo da minimizzare il valore atteso del consumo energetico, rispettando contemporaneamente la scadenza prefissata. Per testare la qualità delle soluzione ottenute con il nostro metodo abbiamo effettuato numerose simulazioni. I risultati dimostrano che la nostra euristica garantisce risultati significativamente migliori rispetto ad altri metodi in letteratura, con una riduzione in percentuale del costo energetico di circa 38-40%. Il nostro metodo fornisce uno schedule ottimo per sistemi con 100 nodi e 400 job in meno di 60 secondi, rendendolo utilizzabile in applicazioni reali. Infine abbiamo valutato l'efficacia del GPU sharing, ovvero l'esecuzione di più job con una singola GPU. I risultati dimostrano che, a seconda del carico di lavoro e della memoria della GPU, questa possibilità può ridurre il costo percentuale del 17-29%.