In this research we propose a model for causal inference, a critical area of investigation for many domains of application. The goal is to provide a tool for analyzing the distributional effects induced by a binary treatment variable, rather than average effects. The proposed model consists of two phases. The first one, coherently with the vast majority of academic research, is related to the construction of a Bayesian potential outcome for functional observations under the two binary treatments. The second one represents the measurement of the causal effect starting from the two sets of curves, one for each treatment. To address this challenge, we developed a theoretical framework for dealing with empirical cumulative distribution function (CDF) of high-dimensional data, which is a popular challenge in functional data analysis. Consequently, we provide a practical method for computing dissimilarities between CDFs, by means of optimal transport methodologies. We evaluate the model using both synthetic datasets and real-world data from Parkinson’s Disease studies. We also compare its performance with well-established methods in the field, such as kernel-based approaches and distribution function models, to assess the effectiveness and robustness of our proposal.
Nella nostra ricerca proponiamo un modello per l'inferenza causale, un'area di ricerca indirizzata a molteplici applicazioni. L'obiettivo è fornire un metodo per l'analisi di effetti in distribuzione indotti da una variabile binaria, piuttosto che effetti in media. Il modello proposto si compone di due fasi. La prima, in linea con la grande maggioranza della ricerca accademica, riguarda la costruzione di una risposta potenziale. Proponiamo infatti un modello Bayesiano per dati funzionali sotto i due trattamenti binari. La seconda rappresenta la misurazione dell'effetto causale a partire dai due insiemi di curve, uno per ogni trattamento. A tal fine, abbiamo sviluppato un quadro teorico per trattare la funzione di distribuzione cumulativa empirica (CDF) di dati ad alta dimensionalità, che non è una procedura banale in contesti con dati funzionali. Di conseguenza, forniamo un metodo pratico per calcolare le dissimilarità tra CDF, utilizzando metodologie di trasporto ottimale. Abbiamo testato il modello risultante sia su dataset sintetici che su dati reali provenienti da studi sul Morbo di Parkinson. Inoltre, confrontiamo le sue prestazioni con metodi consolidati nel campo, come gli approcci basati su kernel e i modelli per funzioni di distribuzione, per valutare l'efficacia e la robustezza della nostra proposta.
Causal distributional effects for functional data
Sparviero, Michele
2024/2025
Abstract
In this research we propose a model for causal inference, a critical area of investigation for many domains of application. The goal is to provide a tool for analyzing the distributional effects induced by a binary treatment variable, rather than average effects. The proposed model consists of two phases. The first one, coherently with the vast majority of academic research, is related to the construction of a Bayesian potential outcome for functional observations under the two binary treatments. The second one represents the measurement of the causal effect starting from the two sets of curves, one for each treatment. To address this challenge, we developed a theoretical framework for dealing with empirical cumulative distribution function (CDF) of high-dimensional data, which is a popular challenge in functional data analysis. Consequently, we provide a practical method for computing dissimilarities between CDFs, by means of optimal transport methodologies. We evaluate the model using both synthetic datasets and real-world data from Parkinson’s Disease studies. We also compare its performance with well-established methods in the field, such as kernel-based approaches and distribution function models, to assess the effectiveness and robustness of our proposal.File | Dimensione | Formato | |
---|---|---|---|
2025_04_Sparviero_Thesis_01.pdf
accessibile in internet per tutti
Descrizione: Thesis
Dimensione
2.64 MB
Formato
Adobe PDF
|
2.64 MB | Adobe PDF | Visualizza/Apri |
2025_04_Sparviero_Executive_Summary_02.pdf
accessibile in internet solo dagli utenti autorizzati
Descrizione: Executive Summary
Dimensione
706.06 kB
Formato
Adobe PDF
|
706.06 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/235610