Data-driven Curation, Learning and Analysis for Inferring Evolving IoT Botnets in the Wild

The insecurity of the Internet-of-Things (IoT) paradigm continues to wreak havoc in consumer and critical infrastructure realms. Several challenges impede addressing IoT security at large, including, the lack of IoT-centric data that can be collected, analyzed and correlated, due to the highly heterogeneous nature of such devices and their widespread deployments in Internet-wide environments. To this end, this paper explores macroscopic, passive empirical data to shed light on this evolving threat phenomena. This not only aims at classifying and inferring Internet-scale compromised IoT devices by solely observing such one-way network traffic, but also endeavors to uncover, track and report on orchestrated "in the wild" IoT botnets. Initially, to prepare the effective utilization of such data, a novel probabilistic model is designed and developed to cleanse such traffic from noise samples (i.e., misconfiguration traffic). Subsequently, several shallow and deep learning models are evaluated to ultimately design and develop a multi-window convolution neural network trained on active and passive measurements to accurately identify compromised IoT devices. Consequently, to infer orchestrated and unsolicited activities that have been generated by well-coordinated IoT botnets, hierarchical agglomerative clustering is deployed by scrutinizing a set of innovative and efficient network feature sets. By analyzing 3.6 TB of recent darknet traffic, the proposed approach uncovers a momentous 440,000 compromised IoT devices and generates evidence-based artifacts related to 350 IoT botnets. While some of these detected botnets refer to previously documented campaigns such as the Hide and Seek, Hajime and Fbot, other events illustrate evolving threats such as those with cryptojacking capabilities and those that are targeting industrial control system communication and control services.

[1]  Christian Rossow,et al.  Amplification Hell: Revisiting Network Protocols for DDoS Abuse , 2014, NDSS.

[2]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[3]  Dave Levin,et al.  Measurement and Analysis of Hajime, a Peer-to-peer IoT Botnet , 2019, NDSS.

[4]  Chadi Assi,et al.  Inferring, Characterizing, and Investigating Internet-Scale Malicious IoT Device Activities: A Network Telescope Perspective , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[5]  Antonio Pescapè,et al.  Analysis of a "/0" stealth scan from a botnet , 2015, TNET.

[6]  Walid Saad,et al.  Deep Learning for Signal Authentication and Security in Massive Internet-of-Things Systems , 2018, IEEE Transactions on Communications.

[7]  Tsutomu Matsumoto,et al.  IoTPOT: A Novel Honeypot for Revealing Current IoT Threats , 2016, J. Inf. Process..

[8]  Evangelos I. Kaisar,et al.  On the impact of empirical attack models targeting marine transportation , 2017, 2017 5th IEEE International Conference on Models and Technologies for Intelligent Transportation Systems (MT-ITS).

[9]  Mahmoud Salmasizadeh,et al.  A new CPA resistant software implementation for symmetric ciphers with smoothed power consumption: SIMON case study , 2017, ISC Int. J. Inf. Secur..

[10]  Nasir Ghani,et al.  Demystifying IoT Security: An Exhaustive Survey on IoT Vulnerabilities and a First Empirical Look on Internet-Scale IoT Exploitations , 2019, IEEE Communications Surveys & Tutorials.

[11]  Mourad Debbabi,et al.  Darknet as a Source of Cyber Intelligence: Survey, Taxonomy, and Characterization , 2016, IEEE Communications Surveys & Tutorials.

[12]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[13]  Qiang Li,et al.  Acquisitional Rule-based Engine for Discovering Internet-of-Thing Devices , 2018, USENIX Security Symposium.

[14]  Paul Rad,et al.  Automatic Text Summarization Using Customizable Fuzzy Features and Attention on the Context and Vocabulary , 2018, 2018 World Automation Congress (WAC).

[15]  Elias Bou-Harb,et al.  Assessing Internet-wide Cyber Situational Awareness of Critical Sectors , 2018, ARES.

[16]  Mourad Debbabi,et al.  A novel cyber security capability: Inferring Internet-scale infections by correlating malware and probing activities , 2016, Comput. Networks.

[17]  J. Alex Halderman,et al.  An Internet-Wide View of Internet-Wide Scanning , 2014, USENIX Security Symposium.

[18]  Ahmad-Reza Sadeghi,et al.  IoT SENTINEL: Automated Device-Type Identification for Security Enforcement in IoT , 2016, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[19]  Yuval Elovici,et al.  ProfilIoT: a machine learning approach for IoT device identification based on network traffic analysis , 2017, SAC.

[20]  Marcin Nawrocki,et al.  A Survey on Honeypot Software and Data Analysis , 2016, ArXiv.

[21]  Mourad Debbabi,et al.  On the inference and prediction of DDoS campaigns , 2015, Wirel. Commun. Mob. Comput..

[22]  Daisuke Inoue,et al.  Cleaning Up the Internet of Evil Things: Real-World Evidence on ISP and Consumer Efforts to Remove Mirai , 2019, NDSS.

[23]  Lars Schmidt-Thieme,et al.  Cost-sensitive learning methods for imbalanced data , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[24]  Dinil Mon Divakaran,et al.  DEFT: A Distributed IoT Fingerprinting Technique , 2019, IEEE Internet of Things Journal.

[25]  Kim-Kwang Raymond Choo,et al.  Comprehending the IoT cyber threat landscape: A data dimensionality reduction technique to infer and characterize Internet-scale IoT probing campaigns , 2019, Digit. Investig..

[26]  Sam L. Thomas Backdoor detection systems for embedded devices , 2018 .

[27]  Yuval Elovici,et al.  SIPHON: Towards Scalable High-Interaction Physical Honeypots , 2017, CPSS@AsiaCCS.

[28]  J. Alex Halderman,et al.  A Search Engine Backed by Internet-Wide Scanning , 2015, CCS.

[29]  Mourad Debbabi,et al.  Big Data Behavioral Analytics Meet Graph Theory: On Effective Botnet Takedowns , 2017, IEEE Network.

[30]  Eric Wustrow,et al.  ZMap: Fast Internet-wide Scanning and Its Security Applications , 2013, USENIX Security Symposium.

[31]  Lionel Metongnon,et al.  Beyond Telnet: Prevalence of IoT Protocols in Telescope and Honeypot Measurements , 2018, WTMC@SIGCOMM.

[32]  Yingjie Tian,et al.  A Comprehensive Survey of Clustering Algorithms , 2015, Annals of Data Science.

[33]  Rajarshi Gupta,et al.  All Things Considered: An Analysis of IoT Devices on Home Networks , 2019, USENIX Security Symposium.

[34]  Ping Zhang,et al.  Risk Prediction with Electronic Health Records: A Deep Learning Approach , 2016, SDM.

[35]  M. Ford,et al.  Initial Results from an IPv6 Darknet13 , 2006, International Conference on Internet Surveillance and Protection (ICISP’06).

[36]  Wu He,et al.  Internet of Things in Industries: A Survey , 2014, IEEE Transactions on Industrial Informatics.

[37]  Elisa Bertino,et al.  Botnets and Internet of Things Security , 2017, Computer.

[38]  Yi Zhou,et al.  Understanding the Mirai Botnet , 2017, USENIX Security Symposium.

[39]  Alberto Dainotti,et al.  Leveraging Internet Background Radiation for Opportunistic Network Analysis , 2015, Internet Measurement Conference.

[40]  Elias Bou-Harb,et al.  Implications of Theoretic Derivations on Empirical Passive Measurements for Effective Cyber Threat Intelligence Generation , 2018, 2018 IEEE International Conference on Communications (ICC).

[41]  Stefan Savage,et al.  Inferring Internet denial-of-service activity , 2001, TOCS.

[42]  Elias Bou-Harb,et al.  A Brief Survey of Security Approaches for Cyber-Physical Systems , 2016, 2016 8th IFIP International Conference on New Technologies, Mobility and Security (NTMS).