Cybersecurity reports and articles contain critical information about the severity, impact, and potential consequences of cyberattacks, but this information is typically divided across unstructured and heterogeneous text sources, making systematic analysis, comparison, and knowledge sharing challenging. The increasing number of documented cyber incidents worldwide highlights the need for automated tools that can process large volumes of textual data and produce structured, actionable insights. This thesis presents a novel framework for automatically extracting quantitative severity metrics from cybersecurity reports using the existing European Repository of Cyber Incidents (EuRepoC) metric, which provides standardized indicators for assessing the intensity and impact of attacks. The framework leverages Large Language Models (LLMs), and explores multiple modeling approaches, including hybrid models (BERT combined with generative models for answer validation) and fully generative architectures (including compact instruction-based models and question-answering systems). The system extracts twelve key variables, grouped into two categories, which are then used to calculate two structured severity indicators: the Cyber Intensity Indicator and the Impact Indicator. Experimental validation demonstrates that generative models, particularly Qwen 3 – 14B, achieve the highest performance, with a correct score accuracy of approximately 71–73% and a default weighted score of 38–38.5, while some models with DeepSeek-R1-Distill-Llama-8B reach Default Weighted Score values around 40, indicating that reliable automatic severity extraction from complex cybersecurity texts is achievable. These results confirm the effectiveness of the proposed approach and its potential to scale across different cyber incident reports. In addition, a final longitudinal analysis was performed using the Cyber Events Database to evaluate the consistency and generalizability of the proposed framework across a broader set of incidents. The results show that the two extracted metrics exhibit a remarkably similar trend over the years, suggesting that the most invasive and technically disruptive attacks are often those that produce the greatest real-world consequences. Moreover, while the average impact of attacks has remained relatively stable, the total number of incidents has increased significantly, with ransomware emerging as the predominant attack type in recent years. Then, an additional comparative analysis by industry and actor type reveals distinct behavioral patterns, with criminal groups dominating in both intensity and impact, while nation-state actors focus on strategically significant sectors such as utilities and public administration. Finally, the study shows that certain malware families, despite being less frequently mentioned in media or threat intelligence sources, sometimes exhibit higher severity scores, demonstrating the value of complementary automated indicators in prioritizing emerging and potentially underestimated threats. Together, these findings demonstrate the robustness of the framework and its ability to capture meaningful patterns in the evolving cyber threat landscape. The contributions of this thesis include the first automated application of the EuRepoC metric, a comprehensive comparative study of LLM-based approaches for cybersecurity information extraction, and the creation of a framework capable of generating structured threat intelligence from unstructured reports.
I report e gli articoli di cybersicurezza contengono informazioni cruciali sulla gravità, l’impatto e le potenziali conseguenze degli attacchi informatici, ma tali informazioni sono tipicamente distribuite in fonti testuali eterogenee e non strutturate, rendendo complessa l’analisi sistematica, il confronto e la condivisione di conoscenza. L’aumento del numero di incidenti informatici documentati a livello globale evidenzia la necessità di strumenti automatizzati in grado di elaborare grandi volumi di testi e produrre informazioni strutturate e utili. Questa tesi presenta un nuovo framework per l’estrazione automatica di metriche quantitative di severità dai report di cybersicurezza utilizzando la metrica esistente di EuRepoC, che fornisce indicatori standardizzati per valutare l’intensità e l’impatto degli attacchi. Il framework sfrutta i LLMs ed esplora diversi approcci di modellazione, inclusi modelli ibridi (BERT combinato con modelli generativi per la validazione delle risposte) e architetture completamente generative (inclusi modelli compatti basati su istruzioni e sistemi di question-answering). Il sistema estrae dodici variabili principali, suddivise in due categorie, utilizzate per calcolare due indicatori di severità strutturati: il Cyber Intensity Indicator e l’Impact Indicator. La validazione sperimentale dimostra che i modelli generativi, in particolare Qwen 3– 14B, ottengono le migliori prestazioni, con una accuratezza di punteggio corretto di circa il 71–73% e un Default Weighted Score di 38–38,5, mentre alcuni modelli basati su DeepSeek-R1-Distill-Llama-8B raggiungono valori di Default Weighted Score intorno a 40, indicando che un’estrazione automatica affidabile delle metriche di severità da testi complessi di cybersicurezza è effettivamente realizzabile. Questi risultati confermano l’efficacia dell’approccio proposto e il suo potenziale di scalare su diversi report di incidenti informatici. In aggiunta, è stata condotta un’analisi longitudinale finale utilizzando il Cyber Events Database per valutare la coerenza e la generalizzabilità del framework proposto su un insieme più ampio di incidenti. I risultati mostrano che le due metriche estratte seguono un andamento notevolmente simile nel tempo, suggerendo che gli attacchi più invasivi e tecnicamente distruttivi sono spesso quelli che producono le conseguenze più rilevanti nel mondo reale. Inoltre, mentre l’impatto medio degli attacchi è rimasto relativamente stabile, il numero totale di incidenti è aumentato significativamente, con il ransomware che è emerso come tipologia di attacco predominante negli ultimi anni. Una successiva analisi comparativa per settore industriale e tipologia di attore rivela pattern comportamentali distinti, con i gruppi criminali che dominano sia in intensità che in impatto, mentre gli attori statali si concentrano su settori strategicamente rilevanti come utilities e pubblica amministrazione. Infine, lo studio mostra che alcune famiglie di malware, pur essendo menzionate meno frequentemente nei media o nelle fonti di threat intelligence, talvolta presentano punteggi di gravità più elevati. Questo evidenzia il valore di indicatori automatici complementari nel dare priorità alle minacce emergenti e potenzialmente sottovalutate. Nel complesso, questi risultati dimostrano la robustezza del framework e la sua capacità di catturare pattern significativi nel panorama delle minacce informatiche in continua evoluzione. I contributi di questa tesi includono la prima applicazione automatizzata della metrica EuRepoC, uno studio comparativo completo di approcci basati su LLMs per l’estrazione di informazioni di cybersicurezza e la creazione di un framework capace di generare threat intelligence strutturata a partire da report non strutturati.
Automatic quantification of cyberattack severity from CTI reports
BORGONOVO, SAMUELE
2024/2025
Abstract
Cybersecurity reports and articles contain critical information about the severity, impact, and potential consequences of cyberattacks, but this information is typically divided across unstructured and heterogeneous text sources, making systematic analysis, comparison, and knowledge sharing challenging. The increasing number of documented cyber incidents worldwide highlights the need for automated tools that can process large volumes of textual data and produce structured, actionable insights. This thesis presents a novel framework for automatically extracting quantitative severity metrics from cybersecurity reports using the existing European Repository of Cyber Incidents (EuRepoC) metric, which provides standardized indicators for assessing the intensity and impact of attacks. The framework leverages Large Language Models (LLMs), and explores multiple modeling approaches, including hybrid models (BERT combined with generative models for answer validation) and fully generative architectures (including compact instruction-based models and question-answering systems). The system extracts twelve key variables, grouped into two categories, which are then used to calculate two structured severity indicators: the Cyber Intensity Indicator and the Impact Indicator. Experimental validation demonstrates that generative models, particularly Qwen 3 – 14B, achieve the highest performance, with a correct score accuracy of approximately 71–73% and a default weighted score of 38–38.5, while some models with DeepSeek-R1-Distill-Llama-8B reach Default Weighted Score values around 40, indicating that reliable automatic severity extraction from complex cybersecurity texts is achievable. These results confirm the effectiveness of the proposed approach and its potential to scale across different cyber incident reports. In addition, a final longitudinal analysis was performed using the Cyber Events Database to evaluate the consistency and generalizability of the proposed framework across a broader set of incidents. The results show that the two extracted metrics exhibit a remarkably similar trend over the years, suggesting that the most invasive and technically disruptive attacks are often those that produce the greatest real-world consequences. Moreover, while the average impact of attacks has remained relatively stable, the total number of incidents has increased significantly, with ransomware emerging as the predominant attack type in recent years. Then, an additional comparative analysis by industry and actor type reveals distinct behavioral patterns, with criminal groups dominating in both intensity and impact, while nation-state actors focus on strategically significant sectors such as utilities and public administration. Finally, the study shows that certain malware families, despite being less frequently mentioned in media or threat intelligence sources, sometimes exhibit higher severity scores, demonstrating the value of complementary automated indicators in prioritizing emerging and potentially underestimated threats. Together, these findings demonstrate the robustness of the framework and its ability to capture meaningful patterns in the evolving cyber threat landscape. The contributions of this thesis include the first automated application of the EuRepoC metric, a comprehensive comparative study of LLM-based approaches for cybersecurity information extraction, and the creation of a framework capable of generating structured threat intelligence from unstructured reports.| File | Dimensione | Formato | |
|---|---|---|---|
|
2025_12_Borgonovo_Executive_Summary.pdf
accessibile in internet per tutti
Descrizione: Executive Summary
Dimensione
2.14 MB
Formato
Adobe PDF
|
2.14 MB | Adobe PDF | Visualizza/Apri |
|
2025_12_Borgonovo_Tesi.pdf
accessibile in internet per tutti
Descrizione: Tesi
Dimensione
4.6 MB
Formato
Adobe PDF
|
4.6 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/246533