This thesis treats an innovative approach, recently introduced in [CB18], for the study of the convergence for a class of learning problems. Many tasks in machine learning and signal processing can be solved by minimizing a convex functional over the space of signed measures. A possible numerical approach in the resolution is to discretize the unknown measure in a mixture of particles, parametrized not only by their mass but also by their position, and then proceeding with a gradient descent method. Unfortunately, this kind of approach loses the convexity property of the original problem, therefore it loses also the guarantee of finding a global minimum. However, it is commonly used despite its non-convexity. Although at the moment [CB18] is limited to simple cases (such as sparse spikes deconvolution or training a neural network with a single hidden layer), it provides conditions that guarantee the convergence to a global minimizer in the many-particle limit. This result has to be interpreted mainly as a consistency principle for the use of non-convex optimization method. Despite this applicative flavour, the work should be considered theoretical. In order to study the convergence, gradient flows in Wasserstein spaces are widely used. Wasserstein spaces are probability spaces arising from optimal transport theory. This is why a whole chapter of this thesis is devoted to an introduction to optimal transport theory, drafted following the path marked by [San15] and, when necessary, drawing from [AGS08]. Another chapter is spent to the introduction of gradient flows both in Euclidean and in Wasserstein spaces. For the first type, reference was made to [San17], while for the second to [AGS08]. My personal contribution can be seen in several ways. In the first place, I gave a more organic and organized structure to [CB18], reserving the right space also for the introduction of optimal transport and gradient flows theories. Secondly, when possible, I contributed deepening some aspects and providing alternative proofs.
In questa tesi viene trattato un approccio innovativo, introdotto recentemente in [CB18], per lo studio della convergenza per una classe di problemi di learning. Molti problemi in machine learning e signal processing possono essere risolti minimizzando un funzionale convesso definito sullo spazio delle misure con segno. Un possibile approccio numerico è quello di discretizzare la misura incognita in un insieme di particelle, parametrizzate non solo dalla loro massa ma anche dalla loro posizione, e procedere con un metodo di tipo gradiente. Purtroppo, questo tipo di approccio perde la proprietà di convessità del problema originale, e quindi anche la garanzia di convergere ad un ottimo globale, ma risulta comunque essere frequentemente utilizzato. Seppur per il momento ristretto a casi semplici (tra cui sparse spikes deconvolution e training di reti neurali con un singolo hidden layer), [CB18] fornisce alcune condizioni che garantiscono la convergenza ad un minimo globale ottenibili asintoticamente quando il numero di particelle tende all’infinito. Queste vanno principalmente intese come un principio di consistenza da tenere a mente nel caso si proceda con la minimizzazione non convessa. Nonostante questo sapore applicativo, il lavoro è da considerarsi teorico. Al fine dello studio della convergenza, viene fatto largo uso di flussi gradiente in spazi di Wasserstein, spazi derivanti dalla teoria del trasporto ottimo. Per questo motivo, un intero capitolo di questa tesi è dedicato ad una introduzione alla teoria del trasporto ottimo, stilata seguendo la linea presente in [San15] e, quando necessario, attingendo a [AGS08]. Un altro capitolo è invece rivolto all’introduzione di flussi gradiente sia in spazi euclidei che di Wasserstein. Per il primo caso si è fatto riferimento a [San17], mentre per il secondo a [AGS08]. Il mio contributo può essere visto sotto più aspetti. In primo luogo c’è sicuramente l’aver conferito una struttura più organica e organizzata a [CB18], dando il giusto spazio anche all’introduzione delle teorie del trasporto ottimo e dei flussi gradiente. In secondo luogo, ove possibile, ho contribuito approfondendo alcuni aspetti e fornendo dimostrazioni alternative.
Convergence to global minimizers for a class of learning models using optimal transport techniques
Bagnara, Marco
2019/2020
Abstract
This thesis treats an innovative approach, recently introduced in [CB18], for the study of the convergence for a class of learning problems. Many tasks in machine learning and signal processing can be solved by minimizing a convex functional over the space of signed measures. A possible numerical approach in the resolution is to discretize the unknown measure in a mixture of particles, parametrized not only by their mass but also by their position, and then proceeding with a gradient descent method. Unfortunately, this kind of approach loses the convexity property of the original problem, therefore it loses also the guarantee of finding a global minimum. However, it is commonly used despite its non-convexity. Although at the moment [CB18] is limited to simple cases (such as sparse spikes deconvolution or training a neural network with a single hidden layer), it provides conditions that guarantee the convergence to a global minimizer in the many-particle limit. This result has to be interpreted mainly as a consistency principle for the use of non-convex optimization method. Despite this applicative flavour, the work should be considered theoretical. In order to study the convergence, gradient flows in Wasserstein spaces are widely used. Wasserstein spaces are probability spaces arising from optimal transport theory. This is why a whole chapter of this thesis is devoted to an introduction to optimal transport theory, drafted following the path marked by [San15] and, when necessary, drawing from [AGS08]. Another chapter is spent to the introduction of gradient flows both in Euclidean and in Wasserstein spaces. For the first type, reference was made to [San17], while for the second to [AGS08]. My personal contribution can be seen in several ways. In the first place, I gave a more organic and organized structure to [CB18], reserving the right space also for the introduction of optimal transport and gradient flows theories. Secondly, when possible, I contributed deepening some aspects and providing alternative proofs.File | Dimensione | Formato | |
---|---|---|---|
2020_12_Bagnara.pdf
non accessibile
Descrizione: Testo della tesi
Dimensione
778.25 kB
Formato
Adobe PDF
|
778.25 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/170366