Deep reinforcement learning for corporate bond market making in multidimensional case

Our thesis is a reproduction of the model presented in the paper by Guéant and Manziuk [8]. The context of interest is that of corporate bond markets, which are mainly OTC markets, where the key players are market makers. The market maker has the crucial role of determining the optimal bid and ask prices for the bonds traded. The goal is to propose prices that maximise the profit generated by the difference between bid and ask, while mitigating the market risk associated with holding inventory. In the literature there exist several models describing the optimisation problem faced by the market maker, most of which are inspired by the Avellaneda-Stoikov model. Although they can often be generalised to a multi-asset framework, these models mainly focus on the numerical solution of the problem with a single asset. The aim of the authors in [8] is to propose a numerical method capable of scaling to the multidimensional case, exploiting the increasingly popular Reinforcement Learning and Deep Learning techniques. Specifically, the method is based on a discrete-time reformulation of the stochastic optimal control problem for market making, within the framework of a model inspired by Avellaneda-Stoikov. The presented algorithm has the structure of an actor-critic, in which value function and policy are approximated by means of deep neural networks and rewards are computed through Monte-Carlo simulations based on the defined model.

La nostra tesi è una riproduzione del modello presentato nel paper di Guéant e Manziuk [8]. Il contesto in cui ci poniamo è quello dei mercati dei corporate bonds, che sono mercati prevalentemente OTC, dove i player principali sono i market makers. La figura del market maker ha l'importante ruolo di determinare i prezzi di bid e ask ottimali per i bond scambiati. L'obbiettivo è proporre dei prezzi che permettano di massimizzare il guadagno generato dalla differenza tra bid e ask, contenendo allo stesso tempo il rischio di mercato associato all'inventario detenuto. Per descrivere il problema di ottimizzazione affrontato dal market maker esistono molti modelli in letteratura, la maggior parte dei quali è ispirata al modello di Avellaneda-Stoikov. Benché spesso possano essere generalizzati ad un framework multi-asset, questi modelli si concentrano principalmente sulla soluzione numerica del problema con un singolo asset. L'obbiettivo degli autori in [8] è quello di proporre un metodo numerico capace di scalare al caso multidimensionale, sfruttando le sempre più diffuse tecniche di Reinforcement Learning e Deep Learning. Nello specifico il metodo si basa su una riformulazione a tempo discreto del problema di stochastic optimal control per il market market making, nel quadro di un modello sempre ispirato a quello di Avellaneda-Stoikov. L'algoritmo presentato ha la struttura di un actor-critic, in cui value function e policy sono approssimate tramite deep neural networks e le rewards sono calcolate attraverso simulazioni Monte-Carlo basate sul modello definito.