This thesis investigates two complementary approaches to statistical arbitrage on a universe of ETFs, aiming to identify short-term mispricings through systematic, data-driven methods. The first strategy extends classical pairs trading to a multivariate setting by constructing cointegrated triplets of ETFs, an approach referred to as Triplets Trading. Candidate triplets are selected through unsupervised clustering after dimensionality reduction via Principal Component Analysis (PCA). A stationary spread is then estimated using Johansen's cointegration method, and trading signals are generated via a strategy which generalizes the concept of Bollinger Bands using dynamic z-score thresholds. The second strategy adopts a non-linear anomaly detection framework based on Autoencoders and Isolation Forest. The Autoencoder is trained to learn a latent representation of the cross-sectional behavior of ETF returns. Temporal anomalies are detected in the most stable latent component using the Isolation Forest algorithm, and subsequently attributed to specific ETFs based on their reconstruction error. Trading signals are derived from these localized deviations under a cross-sectional mean-reversion assumption. Both strategies are evaluated on intraday ETF data using realistic backtesting procedures that account for transaction costs, cumulative profit, drawdowns, and Sharpe Ratios. While the Triplets Trading approach shows more robust performance under frictions, the Autoencoder-based method offers a flexible alternative that captures more complex, non-linear market dynamics. The results highlight the trade-offs between interpretability and flexibility in modern statistical arbitrage, and suggest that combining econometric structure with machine learning techniques can enhance the design of market-neutral trading strategies.
Questa tesi esplora due approcci complementari all’arbitraggio statistico su un universo di ETF, con l’obiettivo di identificare disallineamenti temporanei nei prezzi tramite metodi sistematici e data-driven. La prima strategia estende il pairs trading classico a un contesto multivariato costruendo triplette di ETF cointegrati, un approccio definito come Triplets Trading. Le triplette candidate vengono selezionate attraverso clustering non supervisionato, preceduto da una riduzione dimensionale tramite Principal Component Analysis (PCA). Su ciascuna tripletta selezionata viene stimato uno spread stazionario con il metodo di cointegrazione di Johansen, e i segnali di trading vengono generati mediante strategia che generalizza il concetto delle Bollinger Bands utilizzando soglie dinamiche basate sullo z-score. La seconda strategia adotta un approccio innovativo, basato su un framework non lineare di anomaly detection che combina Autoencoder e Isolation Forest. L’Autoencoder è addestrato per apprendere una rappresentazione latente del comportamento cross-sectional dei ritorni degli ETF. Le anomalie temporali vengono identificate sulla componente latente più stabile attraverso l’Isolation Forest e successivamente attribuite ai singoli ETF in base all’errore di ricostruzione. I segnali operativi sono generati da queste deviazioni localizzate, assumendo una logica di mean-reversion. Entrambe le strategie sono valutate su dati intraday di ETF tramite un backtest realistico che include costi di transazione, profitto cumulato, drawdown e Sharpe Ratio. Mentre l’approccio Triplets Trading mostra una maggiore robustezza in presenza di frizioni di mercato, la strategia basata su Autoencoder rappresenta un’alternativa flessibile in grado di catturare dinamiche di mercato complesse e non lineari. I risultati evidenziano il compromesso tra interpretabilità e flessibilità nell’arbitraggio statistico moderno e suggeriscono che la combinazione tra modelli econometrici e tecniche di machine learning possa migliorare la progettazione di strategie market-neutral.
Statistical arbitrage strategies on ETFs: a comparative analysis of triplet trading and autoencoder-based anomaly detection
Squadrone, Vincenzo
2024/2025
Abstract
This thesis investigates two complementary approaches to statistical arbitrage on a universe of ETFs, aiming to identify short-term mispricings through systematic, data-driven methods. The first strategy extends classical pairs trading to a multivariate setting by constructing cointegrated triplets of ETFs, an approach referred to as Triplets Trading. Candidate triplets are selected through unsupervised clustering after dimensionality reduction via Principal Component Analysis (PCA). A stationary spread is then estimated using Johansen's cointegration method, and trading signals are generated via a strategy which generalizes the concept of Bollinger Bands using dynamic z-score thresholds. The second strategy adopts a non-linear anomaly detection framework based on Autoencoders and Isolation Forest. The Autoencoder is trained to learn a latent representation of the cross-sectional behavior of ETF returns. Temporal anomalies are detected in the most stable latent component using the Isolation Forest algorithm, and subsequently attributed to specific ETFs based on their reconstruction error. Trading signals are derived from these localized deviations under a cross-sectional mean-reversion assumption. Both strategies are evaluated on intraday ETF data using realistic backtesting procedures that account for transaction costs, cumulative profit, drawdowns, and Sharpe Ratios. While the Triplets Trading approach shows more robust performance under frictions, the Autoencoder-based method offers a flexible alternative that captures more complex, non-linear market dynamics. The results highlight the trade-offs between interpretability and flexibility in modern statistical arbitrage, and suggest that combining econometric structure with machine learning techniques can enhance the design of market-neutral trading strategies.| File | Dimensione | Formato | |
|---|---|---|---|
|
2025_07_Squadrone_Tesi.pdf
accessibile in internet per tutti a partire dal 01/07/2026
Descrizione: Tesi
Dimensione
2.76 MB
Formato
Adobe PDF
|
2.76 MB | Adobe PDF | Visualizza/Apri |
|
2025_07_Squadrone_Executive Summary.pdf
accessibile in internet per tutti a partire dal 01/07/2026
Descrizione: Executive Summary
Dimensione
725.47 kB
Formato
Adobe PDF
|
725.47 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/239970