We introduce a novel Two-Levels Cluster-Weighted Model for hierarchical data. More specifically, this model identifies subpopulations of observations (referred to as profiles) within which multilevel models behave in different ways. Additionally, clusters of groups within each profile are identified by using a Semi-Parametric Generalized Linear Mixed Model (SPGLMM) due to the discrete nature of random intercepts. To estimate the model parameters, we propose an Expectation-Maximization Algorithm. We conduct a simulation study to evaluate its ability to identify the presence of latent profiles at the unit level and clusters at the group level and its predictive performance compared to existing models in the literature. Results indicate that our proposed model surpasses others not only in terms of accuracy but also in interpretability, thanks to its ability to identify distinct profiles and clusters. This versatile model is applicable in various scenarios featuring hierarchical data, such as students within schools, citizens within states, or patients within hospitals. To demonstrate the practical utility of our model, we apply it to analyze European educational systems using the OECD-PISA dataset, to investigate the probability of being a low-performer student in math across stu- dents profiles and countries. Our analysis involves profiling students as well as grouping states into clusters to examine the impact of each cluster on student preparation.
Introduciamo un nuovo modello Two-Levels Cluster-Weighted Model per dati gerarchici. Più specificamente, questo modello identifica sotto-popolazioni di osservazioni (chiamate profili) all'interno dei quali i multilevel models si comportano in modi diversi. Inoltre, vengono identificati cluster di gruppi all'interno di ciascun profilo utilizzando un Semi-Parametric Generalized Linear Mixed Model (SPGLMM) grazie alla natura discreta dell'intercetta random. Per stimare i parametri del modello, proponiamo un algoritmo Expectation-Maximization. Conduciamo uno studio di simulazione per valutare la sua capacità di identificare la presenza di profili latenti a livello di unità e cluster a livello di gruppo e le sue prestazioni predittive rispetto ai modelli esistenti in letteratura. I risultati indicano che il nostro modello proposto supera gli altri non solo in termini di accuratezza, ma anche in termini di interpretabilità, grazie alla sua capacità di identificare profili e cluster distinti. Questo modello versatile è applicabile in vari scenari che coinvolgono dati gerarchici, come studenti all'interno di scuole, cittadini all'interno di stati o pazienti all'interno di ospedali. Per dimostrare l'utilità pratica del nostro modello, lo applichiamo per analizzare i sistemi educativi europei utilizzando il dataset OCSE-PISA, per investigare la probabilità di essere uno studente a bassa performance in matematica attraverso i profili degli studenti e cluster dei paesi. La nostra analisi coinvolge il profilare gli studenti e raggruppare gli stati in cluster per esaminare l'impatto di ciascun cluster sulla preparazione degli studenti.
Semi-parametric cluster-weighted multilevel models for two-levels clustering
CAMPLESE, GRETA
2022/2023
Abstract
We introduce a novel Two-Levels Cluster-Weighted Model for hierarchical data. More specifically, this model identifies subpopulations of observations (referred to as profiles) within which multilevel models behave in different ways. Additionally, clusters of groups within each profile are identified by using a Semi-Parametric Generalized Linear Mixed Model (SPGLMM) due to the discrete nature of random intercepts. To estimate the model parameters, we propose an Expectation-Maximization Algorithm. We conduct a simulation study to evaluate its ability to identify the presence of latent profiles at the unit level and clusters at the group level and its predictive performance compared to existing models in the literature. Results indicate that our proposed model surpasses others not only in terms of accuracy but also in interpretability, thanks to its ability to identify distinct profiles and clusters. This versatile model is applicable in various scenarios featuring hierarchical data, such as students within schools, citizens within states, or patients within hospitals. To demonstrate the practical utility of our model, we apply it to analyze European educational systems using the OECD-PISA dataset, to investigate the probability of being a low-performer student in math across stu- dents profiles and countries. Our analysis involves profiling students as well as grouping states into clusters to examine the impact of each cluster on student preparation.File | Dimensione | Formato | |
---|---|---|---|
Executive_Summary-GretaCamplese-10628838.pdf
accessibile in internet per tutti
Descrizione: Executive Summary
Dimensione
609.14 kB
Formato
Adobe PDF
|
609.14 kB | Adobe PDF | Visualizza/Apri |
Thesis-GretaCamplese-10628838.pdf
accessibile in internet per tutti
Descrizione: Thesis
Dimensione
1.72 MB
Formato
Adobe PDF
|
1.72 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/218249