This thesis studies cryptocurrency forecasting under a stricter empirical protocol designed to avoid the most common pitfalls in financial machine learning, including leakage, weak split design, and inconsistent target scales. The study covers six assets (BTC, ETH, ADA, DOGE, XMR, XRP) and compares leakage-aware statistical baselines, compact transformer forecasters, and a multi-foundation zero-shot benchmark (Chronos-2, TimesFM, Moirai, Lag-Llama). The prediction target is one-day log-return, and evaluation combines walk-forward testing, probabilistic diagnostics, cost-aware trading metrics, and explicit inference reporting. Across assets, linear ridge and Chronos-2 remain the strongest average point-track performers, while compact transformers are less stable under the same protocol. Paired no-sentiment versus sentiment experiments show that adding aggregate sentiment features degrades point error on average for both baselines and transformers. Feature-ablation analysis confirms the same pattern: financial-only features remain strongest under purged validation. Inference tables report moving-block bootstrap confidence intervals, exact binomial directional tests against 50\%, and Model Confidence Set (MCS) membership by asset. The core conclusion is that robust causal preprocessing and evaluation design matter more than incremental architecture complexity in this data regime, and that aggregate social features require stronger temporal validation before being considered reliable predictive signals.
Questa tesi analizza la previsione dei rendimenti logaritmici a un giorno delle criptovalute con un protocollo sperimentale rigoroso, progettato per ridurre leakage, incoerenze di target e bias da validazione. Lo studio copre sei asset (BTC, ETH, ADA, DOGE, XMR, XRP) e confronta baseline statistiche leakage-aware, transformer compatti e un benchmark zero-shot multi-foundation (Chronos-2, TimesFM, Moirai, Lag-Llama). La valutazione combina walk-forward, metriche puntuali e probabilistiche, indicatori economici sensibili ai costi di transazione e inferenza statistica esplicita. I risultati mostrano che linear ridge e Chronos-2 costituiscono il gruppo più solido nelle metriche point-track, mentre i transformer compatti risultano mediamente meno stabili. Negli esperimenti appaiati, l’aggiunta delle feature di sentiment aggregate peggiora in media le prestazioni rispetto alla configurazione solo finanziaria. Le evidenze supportano una conclusione principale: in questo regime di dati, la qualità del protocollo causale e della valutazione incide più della complessità architetturale; inoltre, i segnali sociali aggregati richiedono una validazione temporale più forte prima di essere considerati predittori affidabili.
Forecasting in cryptocurrency markets with transformer and foundation baselines
PETRACCA, LUCA
2024/2025
Abstract
This thesis studies cryptocurrency forecasting under a stricter empirical protocol designed to avoid the most common pitfalls in financial machine learning, including leakage, weak split design, and inconsistent target scales. The study covers six assets (BTC, ETH, ADA, DOGE, XMR, XRP) and compares leakage-aware statistical baselines, compact transformer forecasters, and a multi-foundation zero-shot benchmark (Chronos-2, TimesFM, Moirai, Lag-Llama). The prediction target is one-day log-return, and evaluation combines walk-forward testing, probabilistic diagnostics, cost-aware trading metrics, and explicit inference reporting. Across assets, linear ridge and Chronos-2 remain the strongest average point-track performers, while compact transformers are less stable under the same protocol. Paired no-sentiment versus sentiment experiments show that adding aggregate sentiment features degrades point error on average for both baselines and transformers. Feature-ablation analysis confirms the same pattern: financial-only features remain strongest under purged validation. Inference tables report moving-block bootstrap confidence intervals, exact binomial directional tests against 50\%, and Model Confidence Set (MCS) membership by asset. The core conclusion is that robust causal preprocessing and evaluation design matter more than incremental architecture complexity in this data regime, and that aggregate social features require stronger temporal validation before being considered reliable predictive signals.| File | Dimensione | Formato | |
|---|---|---|---|
|
Tesi_Polimi.pdf
accessibile in internet solo dagli utenti autorizzati
Descrizione: Forecasting in Cryptocurrency Markets with Transformer and Foundation Baselines
Dimensione
3.66 MB
Formato
Adobe PDF
|
3.66 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/253774