Reinforcement learning for optimal execution in the cryptocurrency market

The birth of cryptocurrencies has led to enormous social and economic changes. Cryptocurrencies were born with a very specific purpose: to improve everyone's standard of living, making people less and less dependent on unique and independent entities. All by decentralizing the product through the use of blockchains. The first cryptocurrency placed on the market in 2009 is well known: Bitcoin. It is a digital currency that can be traded for goods or services with vendors that accept Bitcoin as payment. It has grown exponentially in recent years, allowing a large number of people to have huge profits thanks to its use. The price of Bitcoin has risen, decreased and increased exponentially several times since its introduction. Predicting for example how Bitcoin price will change over time has become, over the years, an increasingly in-depth research field by the leading experts in the sector. This problem, which is not at all easy to analyze, requires the study of a huge amount of data and acting in real-time with market changes, which, for cryptocurrencies, are even more sudden than the classic financial markets. This is where machine learning comes in, which can play a fundamental role, being the analysis tool par excellence, capable of verifying a large amount of data. This thesis focuses on reinforcement learning (RL), a branch of Machine Learning that allows an algorithm to learn independently, through the acquisition of historical data and interaction with the environment. The main objective is to teach agents who apply algorithms to develop and implement trading strategies for the execution of an order as efficiently as possible, reducing the impact and avoiding the risk of market volatility. To do this, RL techniques has been applied with real historical data of the Bitcoin cryptocurrency: Fitted Q-Iteration, Proximal Policy Optimization and Deep Q-Network. Finally, a variation of the first technique was provided by introducing the concept of persistence: Persistent Fitted Q-Iteration. From time to time the agent was placed in a simulated environment, called ABIDES, in which the real data were imported. The work of this thesis is an extension of the work carried out previously in addressing the problem of optimal execution and is aimed at verifying its validity with historical data. It also represents, to the best of our knowledge, a first application that applies persistence to the optimal execution problem.

La nascita delle cryptovalute ha portato ad enormi cambiamenti sociali ed economici. Ogni cryptovaluta è nata con uno scopo ben preciso: migliorare il tenore di vita di ognuno, rendendo le persone sempre meno dipendenti da entità uniche e a sé stanti. Tutto decentralizzando il prodotto attraverso l’utilizzo delle blockchain. È ben nota infatti la prima cryptomoneta immessa sul mercato: Bitcoin. Essa, negli ultimi anni, ha avuto una crescita esponenziale, permettendo ad un gran numero di persone di avere ingenti guadagni grazie al suo utilizzo. Il prezzo di bitcoin è aumentato, diminuito ed aumentato esponenzialmente diverse volte dalla sua introduzione nel 2009. Prevedere per esempio come Bitcoin potrà cambiare nel tempo, è diventato, con il passare degli anni, ambito di ricerca sempre più approfondito dai maggiori esperti del settore. Questo problema, per nulla semplice da analizzare, richiede lo studio di un’enorme mole di dati ed agire in real-time con i cambiamenti del mercato, che, per le cryptomonete, sono ancora più repentini rispetto ai classici mercati finanziari. È qui che si inserisce il Machine Learning, che può svolgere un ruolo fondamentale, essendo lo strumento di analisi per eccellenza, in grado di verificare una grande quantità di dati. Questa tesi si concentra sul Reinforcement Learning (RL), una branca del Machine Learning che permette ad un algoritmo di imparare in modo indipendente, attraverso l’acquisizione di dati storici e l’interazione con l’ambiente. L’obiettivo principale è quello di insegnare agli agenti che applicano algoritmi, a sviluppare ed implementare strategie di trading per l’esecuzione di un ordine in maniera quanto più efficiente possibile, riducendo l’impatto ed evitando il rischio di volatilità del mercato. Per fare ciò sono stati acquisiti dati storici reali della cryptomoneta Bitcoin per poi applicarli a tecniche di RL: Fitted Q-Iteration, Proximal Policy Optimization e Deep Q-Network. Infine una variazione della prima tecnica è stata fornita introducendo il concetto di persistenza: Persistent Fitted Q-Iteration. Di volta in volta l’agente è stato inserito in un ambiente simulato, chiamato ABIDES, nel quale sono stati importati i dati reali. Il lavoro di questa tesi è un’estensione dei lavori effettuati in precedenza nell’affrontare il problema dell’optimal execution ed è volto a verificarne la validità con dati storici. Inoltre rappresenta, al meglio delle nostre conoscenze, una prima applicazione che applica la persistenza al problema di optimal execution.