Many real networks are inherently directed: messages, goods and money flow in specific directions, and this orientation carries crucial structural information. This thesis revisits five adjacency–based spectral clustering algorithms for directed graphs: Adjacency Spectral Embedding (ASE), Hermitian (HERM), Skew–symmetric (SKEW), Block–Cyclic (BCS) and Block–Acyclic (BAS). We provide a study on a modification of these algorithms: before k-means, the spectral coordinates are reduced to two dimensions using t-SNE, leading to an improvement of the final clustering. The spectral core of each method is left unchanged. The analysis covers the considered algorithms, implementation details, and complexity analyses. In the dense regime all spectral variants are dominated by the O(n^3) cost of the eigen/SVD step with O(n^2) memory, whereas the added Barnes–Hut t-SNE contributes only O(nlogn) and does not alter the asymptotic bottleneck. Performance is evaluated on synthetic and real graphs using both label–based metrics (NMI, F-Score, ARI) and either directionality-aware, or flow-based criteria (Cut–Imbalance and Trade–Flow). Directed Stochastic Block Models (DSBMs) show that the refinement consistently reduces the errors in the classification across a wide range of community counts. LFR benchmarks indicate that the gain persists under substantial inter–community mixing. On real-world datasets emerging from e-mail communications and banking transactions, the refined pipelines yield more interpretable clusters and improved flow structure between them. Overall, a geometric adjustment applied after the eigenspectrum analysis step reliably tightens same–community neighborhoods without changing the dominant computational cost. The result is a unified, implementation–ready toolkit for clustering directed graphs that improves empirical separability while preserving the theoretical and algorithmic foundations of the underlying spectral methods.
Molte reti reali sono intrinsecamente dirette: messaggi, beni e denaro fluiscono in direzioni specifiche e tale orientamento veicola informazioni strutturali cruciali. Questa tesi riconsidera cinque algoritmi di clustering spettrale basati sulla matrice di adiacenza per grafi diretti: Adjacency Spectral Embedding (ASE), Hermitian (HERM), Skew–symmetric (SKEW), Block–Cyclic (BCS) e Block–Acyclic (BAS). Studiamo una modifica di tali algoritmi: prima di applicare l'algoritmo k-means, le coordinate spettrali vengono ridotte a due dimensioni tramite t-SNE, con un miglioramento del clustering finale. Il nucleo spettrale di ciascun metodo rimane invariato. L’analisi copre gli algoritmi considerati, i dettagli implementativi e le analisi di complessità. Utilizzando metodi diretti, tutte le varianti spettrali sono dominate dal costo O(n^3) dovuto al passaggio di calcolo degli autovettori o di calcolo del SVD con memoria O(n^2), mentre l'implementazione di t-SNE tramite Barnes–Hut aggiunge soltanto O(nlogn) e non altera il collo di bottiglia asintotico. Le prestazioni vengono valutate su grafi sintetici e reali utilizzando sia metriche basate sulle etichette (NMI, F-Score, ARI) sia criteri sensibili alla direzionalità o al flusso (Cut-Imbalance e Trade-Flow). I Directed Stochastic Block Models (DSBM) mostrano che il raffinamento riduce sistematicamente gli errori di classificazione su un ampio intervallo di numerosità delle comunità. I benchmark LFR indicano che il miglioramento persiste anche sotto forte mescolamento inter-comunità. Su insiemi di dati reali provenienti da comunicazioni e-mail e transazioni bancarie, le pipeline affinate producono cluster più interpretabili e una struttura di flusso tra essi più coerente. Nel complesso, un aggiustamento geometrico applicato dopo il passo di analisi dello spettro restringe in modo affidabile i nodi intra-comunità senza modificare il costo computazionale dominante. Il risultato è un toolkit unificato e pronto all’uso per il clustering di grafi diretti, che migliora la separabilità empirica preservando i fondamenti teorici e algoritmici dei metodi spettrali sottostanti.
Spectral embeddings for directed graph clustering
Palumbo, Jacopo
2024/2025
Abstract
Many real networks are inherently directed: messages, goods and money flow in specific directions, and this orientation carries crucial structural information. This thesis revisits five adjacency–based spectral clustering algorithms for directed graphs: Adjacency Spectral Embedding (ASE), Hermitian (HERM), Skew–symmetric (SKEW), Block–Cyclic (BCS) and Block–Acyclic (BAS). We provide a study on a modification of these algorithms: before k-means, the spectral coordinates are reduced to two dimensions using t-SNE, leading to an improvement of the final clustering. The spectral core of each method is left unchanged. The analysis covers the considered algorithms, implementation details, and complexity analyses. In the dense regime all spectral variants are dominated by the O(n^3) cost of the eigen/SVD step with O(n^2) memory, whereas the added Barnes–Hut t-SNE contributes only O(nlogn) and does not alter the asymptotic bottleneck. Performance is evaluated on synthetic and real graphs using both label–based metrics (NMI, F-Score, ARI) and either directionality-aware, or flow-based criteria (Cut–Imbalance and Trade–Flow). Directed Stochastic Block Models (DSBMs) show that the refinement consistently reduces the errors in the classification across a wide range of community counts. LFR benchmarks indicate that the gain persists under substantial inter–community mixing. On real-world datasets emerging from e-mail communications and banking transactions, the refined pipelines yield more interpretable clusters and improved flow structure between them. Overall, a geometric adjustment applied after the eigenspectrum analysis step reliably tightens same–community neighborhoods without changing the dominant computational cost. The result is a unified, implementation–ready toolkit for clustering directed graphs that improves empirical separability while preserving the theoretical and algorithmic foundations of the underlying spectral methods.| File | Dimensione | Formato | |
|---|---|---|---|
|
2025_10_Palumbo_Thesis_01.pdf
accessibile in internet per tutti
Descrizione: Thesis
Dimensione
9.69 MB
Formato
Adobe PDF
|
9.69 MB | Adobe PDF | Visualizza/Apri |
|
2025_10_Palumbo_Executive_Summary_02.pdf
accessibile in internet per tutti
Descrizione: Executive Summary
Dimensione
1.12 MB
Formato
Adobe PDF
|
1.12 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/243589