On data-driven curation, learning, and analysis for inferring evolving internet-of-Things (IoT) botnets in the wild

Abstract The insecurity of the Internet-of-Things (IoT) paradigm continues to wreak havoc in consumer and critical infrastructures. The highly heterogeneous nature of IoT devices and their widespread deployments has led to the rise of several key security and measurement-based challenges, significantly crippling the process of collecting, analyzing and correlating IoT-centric data. To this end, this paper explores macroscopic, passive empirical data to shed light on this evolving threat phenomena. The proposed work aims to classify and infer Internet-scale compromised IoT devices by solely observing one-way network traffic, while also uncovering, reporting and thoroughly analyzing “in the wild” IoT botnets. To prepare a relevant dataset, a novel probabilistic model is developed to cleanse unrelated traffic by removing noise samples (i.e., misconfigured network traffic). Subsequently, several shallow and deep learning models are evaluated in an effort to train an effective multi-window convolutional neural network. By leveraging active and passing measurements when generating the training dataset, the neural network aims to accurately identify compromised IoT devices. Consequently, to infer orchestrated and unsolicited activities that have been generated by well-coordinated IoT botnets, hierarchical agglomerative clustering is employed by scrutinizing a set of innovative and efficient network feature sets. Analyzing 3.6 TB of recently captured darknet traffic revealed a momentous 440,000 compromised IoT devices and generated evidence-based artifacts related to 350 IoT botnets. Moreover, by conducting thorough analysis of such inferred campaigns, we reveal their scanning behaviors, packet inter-arrival times, employed rates and geo-distributions. Although several campaigns exhibit significant differences in these aspects, some are more distinguishable; by being limited to specific geo-locations or by executing scans on random ports besides their core targets. While many of the inferred botnets belong to previously documented campaigns such as Hide and Seek , Hajime and Fbot , newly discovered events portray the evolving nature of such IoT threat phenomena by demonstrating growing cryptojacking capabilities or by targeting industrial control services. To motivate empirical (and operational) IoT cyber security initiatives as well as aid in reproducibility of the obtained results, we make the source codes of all the developed methods and techniques available to the research community at large.

[1]  J. Alex Halderman,et al.  An Internet-Wide View of Internet-Wide Scanning , 2014, USENIX Security Symposium.

[2]  Francisco Herrera,et al.  Data Preprocessing in Data Mining , 2014, Intelligent Systems Reference Library.

[3]  A. Winsor Sampling techniques. , 2000, Nursing times.

[4]  Sam L. Thomas Backdoor detection systems for embedded devices , 2018 .

[5]  Yuval Elovici,et al.  SIPHON: Towards Scalable High-Interaction Physical Honeypots , 2017, CPSS@AsiaCCS.

[6]  Abdelkarim Erradi,et al.  Data-driven Curation, Learning and Analysis for Inferring Evolving IoT Botnets in the Wild , 2019, ARES.

[7]  Daisuke Inoue,et al.  Cleaning Up the Internet of Evil Things: Real-World Evidence on ISP and Consumer Efforts to Remove Mirai , 2019, NDSS.

[8]  Elias Bou-Harb,et al.  Implications of Theoretic Derivations on Empirical Passive Measurements for Effective Cyber Threat Intelligence Generation , 2018, 2018 IEEE International Conference on Communications (ICC).

[9]  J. Alex Halderman,et al.  A Search Engine Backed by Internet-Wide Scanning , 2015, CCS.

[10]  Antônio J. Pinheiro,et al.  Identifying IoT devices and events based on packet length from encrypted traffic , 2019, Comput. Commun..

[11]  Elias Bou-Harb,et al.  Assessing Internet-wide Cyber Situational Awareness of Critical Sectors , 2018, ARES.

[12]  Lars Schmidt-Thieme,et al.  Cost-sensitive learning methods for imbalanced data , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[13]  Eric Wustrow,et al.  ZMap: Fast Internet-wide Scanning and Its Security Applications , 2013, USENIX Security Symposium.

[14]  Lionel Metongnon,et al.  Beyond Telnet: Prevalence of IoT Protocols in Telescope and Honeypot Measurements , 2018, WTMC@SIGCOMM.

[15]  Dinil Mon Divakaran,et al.  DEFT: A Distributed IoT Fingerprinting Technique , 2019, IEEE Internet of Things Journal.

[16]  Elisa Bertino,et al.  Botnets and Internet of Things Security , 2017, Computer.

[17]  Yingjie Tian,et al.  A Comprehensive Survey of Clustering Algorithms , 2015, Annals of Data Science.

[18]  Simon Fong,et al.  DBSCAN: Past, present and future , 2014, The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014).

[19]  Yi Zhou,et al.  Understanding the Mirai Botnet , 2017, USENIX Security Symposium.

[20]  Hari Balakrishnan,et al.  Fast portscan detection using sequential hypothesis testing , 2004, IEEE Symposium on Security and Privacy, 2004. Proceedings. 2004.

[21]  Alberto Dainotti,et al.  Leveraging Internet Background Radiation for Opportunistic Network Analysis , 2015, Internet Measurement Conference.

[22]  Rajarshi Gupta,et al.  All Things Considered: An Analysis of IoT Devices on Home Networks , 2019, USENIX Security Symposium.

[23]  Bruno Sinopoli,et al.  Cyber Meets Control: A Novel Federated Approach for Resilient CPS Leveraging Real Cyber Threat Intelligence , 2017, IEEE Communications Magazine.

[24]  Guofei Gu,et al.  BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection , 2008, USENIX Security Symposium.

[25]  Lisandro Zambenedetti Granville,et al.  Improving IoT Botnet Investigation Using an Adaptive Network Layer , 2019, Sensors.

[26]  Naoki Hashimoto,et al.  A study of IoT malware activities using association rule learning for darknet sensor data , 2019, International Journal of Information Security.

[27]  M. Ford,et al.  Initial Results from an IPv6 Darknet13 , 2006, International Conference on Internet Surveillance and Protection (ICISP’06).

[28]  Ping Zhang,et al.  Risk Prediction with Electronic Health Records: A Deep Learning Approach , 2016, SDM.

[29]  Mourad Debbabi,et al.  Behavioral analytics for inferring large-scale orchestrated probing events , 2014, 2014 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[30]  Wu He,et al.  Internet of Things in Industries: A Survey , 2014, IEEE Transactions on Industrial Informatics.

[31]  Tongbo Luo,et al.  IoTCandyJar : Towards an Intelligent-Interaction Honeypot for IoT Devices , 2017 .

[32]  Enda Barrett,et al.  Using Reinforcement Learning to Conceal Honeypot Functionality , 2018, ECML/PKDD.

[33]  Mourad Debbabi,et al.  A systematic approach for detecting and clustering distributed cyber scanning , 2013, Comput. Networks.

[34]  Mourad Debbabi,et al.  Darknet as a Source of Cyber Intelligence: Survey, Taxonomy, and Characterization , 2016, IEEE Communications Surveys & Tutorials.

[35]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[36]  Tsutomu Matsumoto,et al.  IoTPOT: A Novel Honeypot for Revealing Current IoT Threats , 2016, J. Inf. Process..

[37]  Qiang Li,et al.  Acquisitional Rule-based Engine for Discovering Internet-of-Thing Devices , 2018, USENIX Security Symposium.

[38]  Ahmad-Reza Sadeghi,et al.  DÏoT: A Crowdsourced Self-learning Approach for Detecting Compromised IoT Devices , 2018, ArXiv.

[39]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[40]  Kensuke Fukuda,et al.  MAWILab: combining diverse anomaly detectors for automated anomaly labeling and performance benchmarking , 2010, CoNEXT.

[41]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[42]  Ahmad-Reza Sadeghi,et al.  IoT SENTINEL: Automated Device-Type Identification for Security Enforcement in IoT , 2016, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[43]  Yuval Elovici,et al.  ProfilIoT: a machine learning approach for IoT device identification based on network traffic analysis , 2017, SAC.

[44]  Marcin Nawrocki,et al.  A Survey on Honeypot Software and Data Analysis , 2016, ArXiv.

[45]  Aaron Zimba,et al.  Cryptojacking injection: A paradigm shift to cryptocurrency-based web-centric internet attacks , 2019, J. Organ. Comput. Electron. Commer..

[46]  Yuval Elovici,et al.  N-BaIoT—Network-Based Detection of IoT Botnet Attacks Using Deep Autoencoders , 2018, IEEE Pervasive Computing.

[47]  Nick Feamster,et al.  Web-based Attacks to Discover and Control Local IoT Devices , 2018, IoT S&P@SIGCOMM.

[48]  Paul Rad,et al.  Implementation of deep packet inspection in smart grids and industrial Internet of Things: Challenges and opportunities , 2019, J. Netw. Comput. Appl..

[49]  Kim-Kwang Raymond Choo,et al.  Comprehending the IoT cyber threat landscape: A data dimensionality reduction technique to infer and characterize Internet-scale IoT probing campaigns , 2019, Digit. Investig..

[50]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[51]  Christian Rossow,et al.  Amplification Hell: Revisiting Network Protocols for DDoS Abuse , 2014, NDSS.

[52]  Sung-Hyuk Cha Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .

[53]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[54]  Ali Dehghantanha,et al.  BoTShark: A Deep Learning Approach for Botnet Traffic Detection , 2018 .

[55]  Chadi Assi,et al.  Inferring, Characterizing, and Investigating Internet-Scale Malicious IoT Device Activities: A Network Telescope Perspective , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[56]  Elias Bou-Harb,et al.  Theoretic derivations of scan detection operating on darknet traffic , 2019, Comput. Commun..

[57]  Nour Moustafa,et al.  Forensics and Deep Learning Mechanisms for Botnets in Internet of Things: A Survey of Challenges and Solutions , 2019, IEEE Access.

[58]  Stefan Savage,et al.  Inferring Internet denial-of-service activity , 2001, TOCS.

[59]  Elias Bou-Harb,et al.  A Brief Survey of Security Approaches for Cyber-Physical Systems , 2016, 2016 8th IFIP International Conference on New Technologies, Mobility and Security (NTMS).

[60]  Bo Hu,et al.  Subspace Clustering for Interpretable Botnet Traffic Analysis , 2019, ICC 2019 - 2019 IEEE International Conference on Communications (ICC).

[61]  J. Heidemann,et al.  Detecting IoT Devices in the Internet ( Extended ) , 2018 .

[62]  Antonio Pescapè,et al.  Analysis of a "/0" stealth scan from a botnet , 2015, TNET.

[63]  Samuel Marchal,et al.  DÏoT: A Federated Self-learning Anomaly Detection System for IoT , 2018, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[64]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[65]  Nils Ole Tippenhauer,et al.  IoTScanner: Detecting Privacy Threats in IoT Neighborhoods , 2017, IoTPTS@AsiaCCS.

[66]  Mourad Debbabi,et al.  Multidimensional investigation of source port 0 probing , 2014, Digit. Investig..

[67]  Mahmoud Salmasizadeh,et al.  A new CPA resistant software implementation for symmetric ciphers with smoothed power consumption: SIMON case study , 2017, ISC Int. J. Inf. Secur..

[68]  Mourad Debbabi,et al.  On the inference and prediction of DDoS campaigns , 2015, Wirel. Commun. Mob. Comput..

[69]  Nasir Ghani,et al.  Internet of Malicious Things: Correlating Active and Passive Measurements for Inferring and Characterizing Internet-Scale Unsolicited IoT Devices , 2018, IEEE Communications Magazine.

[70]  Dave Levin,et al.  Measurement and Analysis of Hajime, a Peer-to-peer IoT Botnet , 2019, NDSS.

[71]  Mourad Debbabi,et al.  A novel cyber security capability: Inferring Internet-scale infections by correlating malware and probing activities , 2016, Comput. Networks.