Reinforcement learning for fixed income futures trading: an online multi-expert approach

Automatic trading has shown significant potential in developing profitable strategies. However, this depends on market stability, which is often disrupted by frequent regime changes, making model validation difficult. In this work, we used the Proximal Policy Optimization (PPO) reinforcement learning algorithm to the Long Term Euro-BTP Futures market. The goal was to develop a model capable of finding profitable trading strategies that persist over time in the FBTP market. To achieve this, we trained PPO agents to trade one contract of the FBTP futures using market orders with Limit Order Book (LOB) data at a 1-minute frequency. The training was performed using data from 2014 to 2016, validated from 2017 to 2019, and tested on data from 2020 to 2022. Hyperparameters were tuned using the Tree-structured Parzen Estimator (TPE) method. We considered seven different state characterizations, each varying based on their LOB-based features. We conducted a preliminary analysis of feature importance for each feature group through correlation with the target and exposure analysis at different horizons. To ensure robust evaluation, we conducted each experiment by training across multiple trading frequencies, ten random seeds, and five different starting trading minutes. Additionally, to address market non-stationarity, an adaptive online learning layer, Optimistic Adapt ML Prod (OAMP), was built on top of the PPO agents to select the best strategy among a set of experts. Analyzing the agents' performances on test data, even in a highly stochastic and non-stationary environment, the agents can create long-term profitable trading strategies, empirically validating the proposed methodology.

Il trading automatico ha dimostrato un notevole potenziale nel creare strategie di trading redditizie. Tuttavia, ciò dipende dalla stabilità del mercato, che è spesso compromessa dai frequenti cambiamenti di regime, rendendo difficile la validazione dei modelli. In questo lavoro, abbiamo applicato un algoritmo di Reinforcement Learning (RL), Proximal Policy Optimization (PPO), al mercato dei Long Term Euro-BTP Futures (FBTP). L'obiettivo era sviluppare un modello in grado di trovare strategie di trading redditizie che persistano nel tempo nel mercato dei FBTP. Per raggiungere questo obiettivo, abbiamo addestrato agenti PPO per negoziare un contratto dei futures FBTP utilizzando ordini di mercato con dati del Limit Order Book (LOB) a una frequenza di 1 minuto. L'addestramento è stato effettuato utilizzando i dati dal 2014 al 2016, con validazione dal 2017 al 2019 e test sui dati dal 2020 al 2022. I parametri sono stati ottimizzati utilizzando il metodo Tree-structured Parzen Estimator (TPE). Sono state considerate sette diverse rappresentazioni dello stato, ciascuna variando in base a diversi aspetti del LOB. Per ogni gruppo, è stata condotta un'analisi preliminare dell'importanza degli attributi tramite la correlazione con l'obiettivo e un'analisi dell'esposizione di essi su orizzonti temporali differenti. Per garantire una valutazione robusta, abbiamo condotto ogni esperimento addestrando a diverse frequenze di trading, dieci seed casuali e cinque diversi minuti di inizio di trading. Inoltre, per affrontare la non-stazionarietà del mercato, è stato sviluppato un modulo di apprendimento online adattivo, Optimistic Adapt ML Prod (OAMP), integrato con gli agenti PPO per selezionare la strategia migliore tra un set di esperti. Analizzando le performance degli agenti sui dati di test, anche in un ambiente altamente stocastico e non stazionario, gli agenti sono in grado di creare strategie di trading redditizie a lungo termine, validando empiricamente la metodologia proposta.