This thesis aims to propose a spatio-temporal cross‐social methodology to automatically and dynamically mine relevant keywords for ongoing emergency events (e.g. floods, hurricanes, earthquakes) in order to ultimately crawl from social media as many posts with relevant media as possible. Automatically extracting a lot of them in a timely manner is key in many activities, for example to provide useful information for rapidly creating crisis maps during emergency events. Standard approaches use only pre-defined keywords but the risk is losing a significant amount of relevant information. The proposed methodology takes into account spatio-temporal features of the monitored event to better characterize it during its evolution. Cross-social crawling allows to exploit the specificities of a social media on the others. The event-affected areas are automatically identified through density-based clustering and evolve during the ongoing event. Three types of keywords are incrementally generated as time passes in order to continuously extract relevant media as the event evolves. They characterize different aspects: the event itself, the identified areas and the relevant POIs (points of interest). The process is iterative: as new posts with media are extracted with the new available keywords, they are used to refine the event areas and mine new keywords. We tested our algorithm on relevant past events: two floods (with different characteristics) and one earthquake. We then evaluated the proposed methodology by precision and recall of crawling with the generated keywords with respect to our baseline consisting of media crawled through a set of static predefined keywords. We have shown that we are able to extract more than double the media in many cases, generally even increasing a lot their relevance.
L'obiettivo di questa tesi è proporre una metodologia spazio-temporale e cross-social per estrarre automaticamente e dinamicamente parole chiave rilevanti per eventi emergenziali in corso (ad esempio inondazioni, uragani e terremoti) per estrarre dai social media il maggior numero possibile di post con contenuti multimediali rilevanti. Estrarne molti in modo tempestivo è fondamentale in molte attività, ad esempio per fornire informazioni utili per la creazione rapida di mappe di crisi durante gli eventi emergenziali. Gli approcci tradizionali utilizzano solo parole chiave predefinite ma il rischio è quello di perdere una quantità significativa di informazioni rilevanti. La metodologia proposta tiene conto delle caratteristiche spazio-temporali dell'evento monitorato per caratterizzarlo meglio durante la sua evoluzione. L'estrazione cross-social consente di sfruttare le specificità di un social media sugli altri. Le aree interessate dall'evento vengono identificate automaticamente tramite il clustering basato sulla densità e si evolvono durante l'evento in corso. Con il passare del tempo vengono generati tre tipi di parole chiave in modo incrementale, al fine di estrarre continuamente media rilevanti mentre l'evento evolve. Esse caratterizzano diversi aspetti: l'evento stesso, le aree identificate e i POI (punti di interesse) rilevanti. Il processo è iterativo: man mano che nuovi post con media vengono estratti con le nuove parole chiave disponibili, vengono utilizzati per perfezionare le aree degli eventi e per estrarne di nuove. Abbiamo testato il nostro algoritmo con eventi passati rilevanti: due inondazioni (con caratteristiche diverse) e un terremoto. Abbiamo quindi valutato la metodologia proposta mediante precision e recall del crawling con le parole chiave generate, rispetto alla nostra baseline costituita da media estratti tramite un insieme di parole chiave predefinite. Abbiamo dimostrato che siamo in grado di estrarre più del doppio dei media in molti casi, generalmente persino aumentando di molto la loro rilevanza.
Spatio-temporal cross-social media mining for emergency events
AUTELITANO, ANDREA
2017/2018
Abstract
This thesis aims to propose a spatio-temporal cross‐social methodology to automatically and dynamically mine relevant keywords for ongoing emergency events (e.g. floods, hurricanes, earthquakes) in order to ultimately crawl from social media as many posts with relevant media as possible. Automatically extracting a lot of them in a timely manner is key in many activities, for example to provide useful information for rapidly creating crisis maps during emergency events. Standard approaches use only pre-defined keywords but the risk is losing a significant amount of relevant information. The proposed methodology takes into account spatio-temporal features of the monitored event to better characterize it during its evolution. Cross-social crawling allows to exploit the specificities of a social media on the others. The event-affected areas are automatically identified through density-based clustering and evolve during the ongoing event. Three types of keywords are incrementally generated as time passes in order to continuously extract relevant media as the event evolves. They characterize different aspects: the event itself, the identified areas and the relevant POIs (points of interest). The process is iterative: as new posts with media are extracted with the new available keywords, they are used to refine the event areas and mine new keywords. We tested our algorithm on relevant past events: two floods (with different characteristics) and one earthquake. We then evaluated the proposed methodology by precision and recall of crawling with the generated keywords with respect to our baseline consisting of media crawled through a set of static predefined keywords. We have shown that we are able to extract more than double the media in many cases, generally even increasing a lot their relevance.File | Dimensione | Formato | |
---|---|---|---|
2018_07_Autelitano.pdf
accessibile in internet solo dagli utenti autorizzati
Descrizione: Thesis text
Dimensione
14.82 MB
Formato
Adobe PDF
|
14.82 MB | Adobe PDF | Visualizza/Apri |
I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/10589/141809