Inference in functional data analysis framework : simulation studies and code optimization

This work focuses on inferential methods for functional data. An overview about inferential methods able to select the statistically significant intervals of the domain is provided, especially focusing on permutation solutions. The Interval Testing Procedure properties are explored through simulations. The investigation of the Smoothing effects on the inferential analysis is all-important. A simulation study is conducted where the Interval Testing Procedure is compared with Benjamini-Hochberg, Bonferroni-Holm and Bonferroni multiple testing procedures. The chosen metrics are the Family Wise Error Rate, the Rejection Rate of the False null hypotheses, the Rejection Rate of the True null hypotheses and the Power. The hypothesis testing problem is the two-sided distributional comparison between two independent populations of functions. The differences between populations in terms of mean are localized in an interval located in the center of the domain. The B-spline basis expansion is used throughout the simulations. Both the Regression and Smoothing splines methods are considered. The parameters of interest related to Smoothing are the order of the basis elements, the number of basis elements and the smoothing parameter. The parameters of interest determining the data set are the number of evaluations, the standard deviation of the additive Normal noise and the number of statistical units. It is of relevant interest to explore the differences in terms of the ability to make true discoveries between the Interval Testing Procedure and the Benjamini-Hochberg procedure, knowing that the former procedure controls the Family Wise Error Rate on intervals and the latter procedure ensures only a weak control of the Family Wise Error Rate. Best practices can be deduced from the simulation results such as the optimality of the cubic splines with a sufficiently high number of basis elements for the Interval Testing Procedure in the case of discontinuos functional data. In these scenarios, the performances of the Interval Testing Procedure and the Benjamini-Hochberg procedure are equivalent in terms of the Rejection Rate of the False null hypotheses. The Rejection Rate of the False null hypotheses is a more precise measure of the ability to make true discoveries than the Power. Finally, in general for Interval Testing Procedure it is better to choose the number of basis elements relatively high. The code for the simulations has been implemented in R. The fdatest R package has been used modifying the source code. The most important update is the implementation in C of the combining matrix construction which is the most computationally expensive task in the Interval Testing Procedure algorithm for the Two-population framework. The used implementation of the Interval Testing Procedure directly works on an object of the functional data class. Hence, the Smoothing is entrusted to the user avoiding subjective choices which had to be taken automatically in the original version of fdatest. These features involve a significant gain in terms of execution time and a simplification of the interface.

Questo lavoro si focalizza sui metodi inferenziali per dati funzionali. Viene data una visione d'insieme riguardo ai metodi inferenziali in grado di selezionare gli intervalli del dominio statisticamente significativi, in particolare concentrandosi su soluzioni permutazionali. Le proprietà dell'Interval Testing Procedure vengono esplorate per via simulativa. Lo studio degli effetti dello Smoothing sull'analisi inferenziale è di primaria importanza. Uno studio di simulazione viene effettuato dove l'Interval Testing Procedure viene confrontata con le correzioni di molteplicità Benjamini-Hochberg, Bonferroni-Holm e Bonferroni. Le metriche scelte sono il Family Wise Error Rate, il Tasso di Rifiuto delle ipotesi nulle False, il Tasso di Rifiuto delle ipotesi nulle Vere e la Potenza. Il test di ipotesi è il confronto distribuzionale bilatero tra due popolazioni indipendenti di funzioni. Le differenze tra le popolazioni in termini di media sono localizzate in un intervallo situato nel centro del dominio. L'espansione in base B-spline è usata in tutte le simulazioni. Vengono considerati entrambi i metodi Regression splines e Smoothing splines. I parametri di interesse legati allo Smoothing sono l'ordine degli elementi della base, il numero degli elementi della base e lo smoothing parameter. I parametri di interesse che determinano il data set sono il numero di valutazioni, la deviazione standard del rumore Normale additivo ed il numero di unità statistiche. È di rilevante interesse esplorare le differenze in termini dell'abilità di effettuare vere scoperte tra l'Interval Testing Procedure e la procedura Benjamini-Hochberg, sapendo che la prima procedura controlla il Family Wise Error Rate per intervalli e la seconda procedura garantisce solo un controllo debole del Family Wise Error Rate. Dai risultati delle simulazioni si possono dedurre best practices come l'ottimalità delle spline cubiche con un numero sufficientemente elevato di elementi della base per l'Interval Testing Procedure nel caso di dati funzionali discontinui. In questi scenari, le prestazioni dell'Interval Testing Procedure e della procedura Benjamini-Hochberg sono equivalenti in termini del Tasso di Rifiuto delle ipotesi nulle False. Il Tasso di Rifiuto delle ipotesi nulle False costituisce una misura più precisa dell'abilità di effettuare vere scoperte rispetto alla Potenza. Infine, in generale per l'Interval Testing Procedure è meglio scegliere il numero degli elementi della base sufficientemente elevato. Il codice per le simulazioni è stato implementato in R. Il pacchetto R fdatest è stato usato modificandone il codice sorgente. L'aggiornamento più importante è l'implementazione in C della costruzione della combining matrix che è l'operazione più costosa nell'algoritmo per l'Interval Testing Procedure nel caso del confronto distribuzionale tra due popolazioni di funzioni. L'implementazione usata dell'Interval Testing Procedure opera direttamente su un oggetto della classe functional data. Pertanto, lo Smoothing è affidato all'utente evitando scelte soggettive che dovevano essere prese in automatico nella versione originale di fdatest. Queste caratteristiche comportano un significativo guadagno in termini di tempo d'esecuzione ed una semplificazione dell'interfaccia.