This research presents a temporal analysis of four major social media platforms—Reddit, Twitter, Discord, and Telegram—to evaluate their potential as sources of Cyber Threat Intelligence (CTI). While traditional CTI often provides only reactive intelligence, documenting threats after they make their impact, underground forums provably offered proactive intelligence in the last years. As research suggests these communities may be migrating elsewhere, we investigate whether mainstream social media has become their new destination. Our methodology employs a three-step pipeline for processing 199 million social media entries from 2007 to 2024. We first filter content using cybersecurity-specific keywords, then extract Indicators of Compromise (IOCs) using regular expressions targeting hashes, IP addresses, and URLs, and finally validate these candidates against VirusTotal to determine both their maliciousness and temporal placement. Our analysis identified over 30,900 true IOCs across all platforms, with Reddit contributing the largest share. However, temporal analysis revealed consistently negative mean latencies across platforms, ranging from -229 days for hashes to -3737 days for URLs, indicating that social media predominantly discusses threats after their documentation in established intelligence sources. Only 0.98% of hashes and 1.56% of URLs demonstrated positive latency. We conclude that while these platforms contain substantial CTI value, they function primarily as discussion venues for known threats rather than early warning systems. This suggests cybersecurity communities may have migrated to more exclusive venues and that social media data may be more valuable for tracking malware trends than for early threat detection.
Questo studio presenta un’analisi temporale di quattro popolari piattaforme social—Reddit, Twitter, Discord e Telegram—per valutare il loro potenziale come fonti di Cyber Threat Intelligence (CTI). Mentre la CTI tradizionale fornisce spesso solo intelligence reattiva, documentando le minacce post-impatto, i forum under- ground in passato offrivano intelligence proattiva. Poiché ricerche precedenti suggeriscono che queste comunità possano aver migrato altrove, l’obiettivo dello studio è indagare se i social media mainstream siano diventati la loro nuova destinazione. La nostra metodologia impiega una pipeline a tre fasi per elaborare 199 milioni di voci dai social media, dal 2007 al 2024. Inizialmente filtriamo il contenuto usando parole chiave specifiche per la cybersecurity, poi estraiamo Indicatori di Compromissione (IOC) tramite regex mirate all’estrazione di hash, indirizzi IP e URL, infine verifichiamo questi potenziali IOC con VirusTotal per determinarne la malignità e la collocazione temporale. La nostra analisi ha identificato oltre 30.900 IOC reali su tutte le piattaforme, con Reddit che contribuisce con la quota maggiore. Tuttavia, l’analisi temporale ha evidenziato latenze medie negative su tutte le piattaforme, da -229 giorni per gli hash a -3737 giorni per gli URL, indicando che i social media discutono prevalentemente le minacce dopo la loro documentazione in fonti di intelligence consolidate. Solo lo 0,98% degli hash e l’1,56% degli URL hanno mostrato latenza positiva. Concludiamo che, sebbene queste piattaforme contengano un notevole valore per la CTI, esse operano principalmente come luoghi di discussione per minacce già note, piuttosto che come sistemi di allarme anticipatorio. Ciò suggerisce che le comunità di cybersecurity potrebbero aver migrato verso ambienti più esclusivi, rendendo i dati dei social media più utili per monitorare le tendenze dei malware piuttosto che per la rilevazione anticipata delle minacce.
Connecting the dots: a temporal analysis on the Cyber Threat Intelligence potential of social media discussions
Pellegrino, Davide Edoardo
2023/2024
Abstract
This research presents a temporal analysis of four major social media platforms—Reddit, Twitter, Discord, and Telegram—to evaluate their potential as sources of Cyber Threat Intelligence (CTI). While traditional CTI often provides only reactive intelligence, documenting threats after they make their impact, underground forums provably offered proactive intelligence in the last years. As research suggests these communities may be migrating elsewhere, we investigate whether mainstream social media has become their new destination. Our methodology employs a three-step pipeline for processing 199 million social media entries from 2007 to 2024. We first filter content using cybersecurity-specific keywords, then extract Indicators of Compromise (IOCs) using regular expressions targeting hashes, IP addresses, and URLs, and finally validate these candidates against VirusTotal to determine both their maliciousness and temporal placement. Our analysis identified over 30,900 true IOCs across all platforms, with Reddit contributing the largest share. However, temporal analysis revealed consistently negative mean latencies across platforms, ranging from -229 days for hashes to -3737 days for URLs, indicating that social media predominantly discusses threats after their documentation in established intelligence sources. Only 0.98% of hashes and 1.56% of URLs demonstrated positive latency. We conclude that while these platforms contain substantial CTI value, they function primarily as discussion venues for known threats rather than early warning systems. This suggests cybersecurity communities may have migrated to more exclusive venues and that social media data may be more valuable for tracking malware trends than for early threat detection.File | Dimensione | Formato | |
---|---|---|---|
2025_04_Pellegrino_Tesi.pdf
accessibile in internet solo dagli utenti autorizzati
Descrizione: testo tesi
Dimensione
1.72 MB
Formato
Adobe PDF
|
1.72 MB | Adobe PDF | Visualizza/Apri |
2025_04_Pellegrino_Executive Summary.pdf
accessibile in internet solo dagli utenti autorizzati
Descrizione: executive summary
Dimensione
744.48 kB
Formato
Adobe PDF
|
744.48 kB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/235177