WMD : a scalable Web-malware detection system

In the last decade, due to the massive growth of the Internet, attacks based on the web have become a huge threat. Billions of users access the Internet every day visiting various and different websites without knowing that a compromised or malicious website could exploit vulnerabilities in their browser or device to gain control of their system. Because of the popularity of the Internet and outdated devices or browsers, there is a long successful history of websites exploiting unsuspecting users in the wild just by making them visit a webpage. Usually, web malware leverage bugs or features inside a browser JavaScript API to compromise the host client. Through the years, attackers evolved their malware from simple static JavaScript to more complex techniques including obfuscation, detection evasion, and complex browser exploitation. This work proposes a scalable system to identify web malware with the hypothesis that web malware interacts with the client browser, network, and OS in an unusual way. We created a system that can show in detail the malicious features that we extracted, and the scores used to judge a website. We reviewed different websites offering web malware detection and they do not give many explanations on their detection process. Finally, in this work, we performed a scan of almost 4 million websites to detect malware accessing private network addresses. Some devices connected to a private network, such as home routers or IoT devices, may assume that only the authorized people have access to the network. This assumption is not always true since a webpage in a browser can access the private network, and a web malware could leverage this to be able to target devices connected to the private network.

Nell'ultimo decennio, a causa della massiccia crescita di Internet, gli attacchi basati sul web sono diventati un'enorme minaccia. Miliardi di utenti accedono ogni giorno a Internet visitando siti web senza sapere che un sito web compromesso o dannoso potrebbe sfruttare le vulnerabilità del proprio browser o dispositivo per ottenere il controllo del proprio sistema. A causa della popolarità di Internet e dei dispositivi o browser obsoleti, esiste una lunga storia di successo di siti web che danneggiano gli utenti semplicemente facendogli visitare una pagina Web. Di solito, il malware web sfrutta bug o funzionalità all'interno di delle API JavaScript del browser per compromettere il sistema dell'utente. Nel corso degli anni, gli aggressori hanno migliorato il loro malware e da semplice JavaScript statico ha malware più complessi grazie a tecniche tra cui l'offuscamento, l'evasione del rilevamento e lo sfruttamento di vulnerabilità del browser. Questo lavoro propone un sistema scalabile per identificare il malware web con l'ipotesi che il malware web interagisca con il browser, la rete e il sistema operativo in maniera insolita. Abbiamo creato un sistema in grado di mostrare in dettaglio le funzionalità dannose e i punteggi utilizzati per giudicare un sito web. Abbiamo esaminato diversi siti web che offrono il rilevamento di malware web e non forniscono molte spiegazioni sul loro processo di rilevamento. Infine, in questo lavoro, abbiamo eseguito una scansione di quasi 4 milioni di siti Web per rilevare malware che accedono a indirizzi di rete privati. Alcuni dispositivi collegati a una rete privata, come router domestici o dispositivi IoT, possono presumere che solo le persone autorizzate abbiano accesso alla rete. Questo presupposto non è sempre vero poiché una pagina web in un browser può accedere alla rete privata e un malware web potrebbe sfruttarlo per poter indirizzare i dispositivi collegati alla rete privata.