In Statistics and Machine Learning, accurately estimating latent partitions of data poses significant challenges. This thesis explores Product Partition Models (PPMs), a class of nonparametric Bayesian methods primarily used for clustering. The focus is specifically on covariate-dependent PPMs, which incorporate covariates into the model to create clusters that better reflect underlying similarities among observations. To implement these models, the research introduces an enhanced algorithm called Split-Merge, designed to improve their efficiency and accuracy. Its performances are then compared to those of the well-known Gibbs sampler. The theoretical background and practical applications of both methods are discussed in detail. The effectiveness of the Split-Merge method applied to PPMs is evaluated through extensive simulations and applications to real-world case studies. The results demonstrate both the efficiency of the proposed algorithm and the efficacy of the covariate-based PPMs, highlighting their strengths along with their limitations.
In Statistica e Machine Learning, stimare accuratamente le partizioni latenti dei dati presenta sfide significative. Questa tesi si propone di studiare i Product Partion Models (PPMs), una classe di metodi Bayesiani non parametrici principalmente utilizzati per il clustering. L’attenzione è rivolta alle varianti di questi metodi che dipendono da covariate, le quali consentono di incorporare covariate nel modello con l'obiettivo di creare cluster che riflettano meglio la somiglianza tra le osservazioni. Per implementare questi modelli, il presente lavoro introduce un algoritmo avanzato chiamato Split-Merge, progettato per migliorarne l'efficienza e l'accuratezza. Le prestazioni di questo algoritmo vengono poi confrontate con quelle del noto metodo chiamato Gibbs Sampler. Il contesto teorico e le applicazioni pratiche di entrambi i metodi vengono discusse in dettaglio. L’efficacia del metodo Split-Merge applicato ai PPMs viene valutata attraverso estese simulazioni e applicazioni a casi di studio reali. I risultati dimostrano sia l'efficienza dell'algoritmo proposto sia l'efficacia dei PPMs basati su covariate, evidenziando i loro punti di forza insieme alle loro limitazioni.
Efficient inference for product partition models with covariate dependent prior
Dalla Vecchia, Gabriele;Bai, Andrea Giulia
2023/2024
Abstract
In Statistics and Machine Learning, accurately estimating latent partitions of data poses significant challenges. This thesis explores Product Partition Models (PPMs), a class of nonparametric Bayesian methods primarily used for clustering. The focus is specifically on covariate-dependent PPMs, which incorporate covariates into the model to create clusters that better reflect underlying similarities among observations. To implement these models, the research introduces an enhanced algorithm called Split-Merge, designed to improve their efficiency and accuracy. Its performances are then compared to those of the well-known Gibbs sampler. The theoretical background and practical applications of both methods are discussed in detail. The effectiveness of the Split-Merge method applied to PPMs is evaluated through extensive simulations and applications to real-world case studies. The results demonstrate both the efficiency of the proposed algorithm and the efficacy of the covariate-based PPMs, highlighting their strengths along with their limitations.File | Dimensione | Formato | |
---|---|---|---|
2024_12_Bai_DallaVecchia_ExecutiveSummary_02.pdf
accessibile in internet solo dagli utenti autorizzati
Dimensione
544.38 kB
Formato
Adobe PDF
|
544.38 kB | Adobe PDF | Visualizza/Apri |
2024_12_Bai_DallaVecchia_Tesi_01.pdf
accessibile in internet solo dagli utenti autorizzati
Dimensione
4.08 MB
Formato
Adobe PDF
|
4.08 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/230682