This study explores the use of machine learning to classify Italian political party affiliations in Meta Ads Library data, addressing the need for transparency in digital political campaigning within Italy’s multi-party system. Motivated by the potential of computational methods to enhance electoral accountability, various feature representations and classifiers are investigated in this complex, high-dimensional dataset marked by class imbalance. The research evaluates numeric attributes (e.g., ad duration, targeting criteria), text embeddings (BERT, Word2Vec, TF-IDF), their combined representation, and reduced forms via PCA and SVD, using Support Vector Machines, Logistic Regression, Random Forests, and XGBoost, with and without SMOTE for class balancing. A dataset of 8,635 ads from seven Italian political parties, including Partito Democratico and Lega, was preprocessed and analyzed. XGBoost with combined features achieved the highest test accuracy of 0.89, followed by embeddings at 0.86, while numeric features ranged from 0.20 to 0.52. Visualizations, such as a heatmap, highlight the superior performance of combined and embedding-based representations, with SMOTE improving outcomes in specific low-dimensional cases. These findings provide a scalable method for analyzing political advertisements, offering insights into party strategies and supporting transparency in digital campaigns. Future work could leverage larger datasets or advanced embeddings to further enhance performance.
Questo studio esplora l’uso del machine learning per classificare l’affiliazione politica dei partiti italiani negli annunci presenti nella Meta Ads Library, affrontando la necessità di maggiore trasparenza nelle campagne digitali all’interno del sistema multipartitico italiano. Motivato dal potenziale delle tecniche computazionali nel rafforzare la responsabilità elettorale, vengono analizzate diverse rappresentazioni delle caratteristiche e modelli di classificazione su un dataset complesso, ad alta dimensionalità e con squilibrio tra classi. La ricerca valuta attributi numerici (es. durata degli annunci, criteri di targeting), embeddings testuali (BERT, Word2Vec, TF-IDF), la loro combinazione e versioni ridotte tramite PCA e SVD, utilizzando SVM, Regressione Logistica, Random Forests e XGBoost, con e senza SMOTE per il bilanciamento delle classi. Un dataset di 8.635 annunci provenienti da sette partiti italiani, tra cui Partito Democratico e Lega, è stato pre-elaborato e analizzato. XGBoost con caratteristiche combinate ha ottenuto la miglior accuratezza sul test set (0.89), seguito dagli embeddings (0.86), mentre le caratteristiche numeriche hanno registrato performance tra 0.20 e 0.52. Visualizzazioni, come le mappe di calore, evidenziano l’efficacia delle rappresentazioni combinate e basate su embeddings, con SMOTE che migliora le prestazioni in alcuni casi a bassa dimensionalità. Questi risultati propongono un metodo scalabile per l’analisi degli annunci politici, offrendo spunti sulle strategie di partito e promuovendo la trasparenza delle campagne digitali. Studi futuri potrebbero beneficiare di dataset più ampi o embeddings più avanzati per migliorare ulteriormente le performance.
Predicting italian political party affiliation in Meta Ads Library Data: a comparative study of machine learning approaches
Yahyanejad, Alireza
2024/2025
Abstract
This study explores the use of machine learning to classify Italian political party affiliations in Meta Ads Library data, addressing the need for transparency in digital political campaigning within Italy’s multi-party system. Motivated by the potential of computational methods to enhance electoral accountability, various feature representations and classifiers are investigated in this complex, high-dimensional dataset marked by class imbalance. The research evaluates numeric attributes (e.g., ad duration, targeting criteria), text embeddings (BERT, Word2Vec, TF-IDF), their combined representation, and reduced forms via PCA and SVD, using Support Vector Machines, Logistic Regression, Random Forests, and XGBoost, with and without SMOTE for class balancing. A dataset of 8,635 ads from seven Italian political parties, including Partito Democratico and Lega, was preprocessed and analyzed. XGBoost with combined features achieved the highest test accuracy of 0.89, followed by embeddings at 0.86, while numeric features ranged from 0.20 to 0.52. Visualizations, such as a heatmap, highlight the superior performance of combined and embedding-based representations, with SMOTE improving outcomes in specific low-dimensional cases. These findings provide a scalable method for analyzing political advertisements, offering insights into party strategies and supporting transparency in digital campaigns. Future work could leverage larger datasets or advanced embeddings to further enhance performance.File | Dimensione | Formato | |
---|---|---|---|
2025_07_Yahyanejad.pdf
non accessibile
Descrizione: Text of the Document
Dimensione
1.88 MB
Formato
Adobe PDF
|
1.88 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/239957