Autoencoder-based feature construction for IoT attacks clustering

Abstract Variations in commands executed as part of the attack process can be used to determine the behavioural patterns of IoT attacks. Existing approaches rely on the domain knowledge of security experts to identify the behavioural patterns, categorise and classify cyber attacks. We proposed an Autoencoder(AE)-based feature construction approach to remove the dependency of manually correlating commands and generate an efficient representation by automatically learning the semantic similarity between input features extracted through commands data. We applied three clustering algorithms, i.e., K-means, Gaussian Mixture Models and Density-based spatial clustering of applications with noise, on our data set of AE features. We discussed the clustering arrangements for understanding the impact of changes in commands on behavioural patterns of attacks and how attacks are grouped in the same or different clusters. Evaluation of our feature construction approach shows that the clustering algorithm grouped attacks with more common features values compared to clustering with original features. Moreover, we performed a comparative analysis of two existing feature extraction approaches on our data set considering the type of analysis in the process, generalisability of applying features, coverage to the data set and clustering arrangements. We found that challenges identified in applying existing approaches can be addressed with our proposed approach and improving features with AE resulted in providing meaningful clustering interpretations.

[1]  Luca Scrucca,et al.  mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models , 2016, R J..

[2]  Yuancheng Li,et al.  A Hybrid Malicious Code Detection Method based on Deep Learning , 2015 .

[3]  Mahmood Yousefi-Azar,et al.  Autoencoder-based feature learning for cyber security applications , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[4]  Liang Zhou,et al.  Cyber-Attack Classification in Smart Grid via Deep Neural Network , 2018, CSAE '18.

[5]  Nathan S. Netanyahu,et al.  DeepSign: Deep learning for automatic malware signature generation and classification , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[6]  Manuel Mazzara,et al.  AntibIoTic: Protecting IoT Devices Against DDoS Attacks , 2016, SEDA.

[7]  Esmaeil Kheirkhah,et al.  An Experimental Study of SSH Attacks by using Honeypot Decoys , 2013 .

[8]  Ítalo S. Cunha,et al.  The Evolution of Bashlite and Mirai IoT Botnets , 2018, 2018 IEEE Symposium on Computers and Communications (ISCC).

[9]  Tsutomu Matsumoto,et al.  IoTPOT: A Novel Honeypot for Revealing Current IoT Threats , 2016, J. Inf. Process..

[10]  Enda Barrett,et al.  New framework for adaptive and agile honeypots , 2020, ETRI Journal.

[11]  Kishore Angrishi,et al.  Turning Internet of Things(IoT) into Internet of Vulnerabilities (IoV) : IoT Botnets , 2017, ArXiv.

[12]  Haoxiang Wang,et al.  Computer and Cyber Security , 2018 .

[13]  Taeeun Kim,et al.  Management platform of threats information in IoT environment , 2018, J. Ambient Intell. Humaniz. Comput..

[14]  Kyung Kyu Kim,et al.  Modified cyber kill chain model for multimedia service environments , 2018, Multimedia Tools and Applications.

[15]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[16]  Karl Sigler Crypto-jacking: how cyber-criminals are exploiting the crypto-currency boom , 2018 .

[17]  Kalyan Veeramachaneni,et al.  Learning Representations for Log Data in Cybersecurity , 2017, CSCML.

[18]  Ramjee Prasad,et al.  Proposed embedded security framework for Internet of Things (IoT) , 2011, 2011 2nd International Conference on Wireless Communication, Vehicular Technology, Information Theory and Aerospace & Electronic Systems Technology (Wireless VITAE).

[19]  Hans D. Schotten,et al.  Investigation of cyber crime conducted by abusing weak or default passwords with a medium interaction honeypot , 2017, 2017 International Conference on Cyber Security And Protection Of Digital Services (Cyber Security).

[20]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[21]  Fan Liu,et al.  Determine the Number of Unknown Targets in Open World Based on Elbow Method , 2021, IEEE Transactions on Fuzzy Systems.

[22]  Yuval Elovici,et al.  N-BaIoT—Network-Based Detection of IoT Botnet Attacks Using Deep Autoencoders , 2018, IEEE Pervasive Computing.

[23]  Xiao Han,et al.  Deception Techniques in Computer Security , 2018, ACM Comput. Surv..

[24]  Michael Hahsler,et al.  dbscan: Fast Density-Based Clustering with R , 2019, Journal of Statistical Software.

[25]  Daniel S. Berman,et al.  A Survey of Deep Learning Methods for Cyber Security , 2019, Inf..

[26]  Ruigang Liang,et al.  An Inside Look at IoT Malware , 2017 .

[27]  Stjepan Picek,et al.  Automatic Feature Construction for Network Intrusion Detection , 2017, SEAL.

[28]  Yingjie Tian,et al.  A Comprehensive Survey of Clustering Algorithms , 2015, Annals of Data Science.

[29]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[30]  Gregorio Martínez Pérez,et al.  Identification and Classification of Cyber Threats Through SSH Honeypot Systems , 2020 .

[31]  Brandon M. Greenwell,et al.  Hands-On Machine Learning with R , 2019 .

[32]  I. Welch,et al.  A Measurement Study of IoT-Based Attacks Using IoT Kill Chain , 2020, 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom).

[33]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[34]  James A. Jerkins Motivating a market or regulatory solution to IoT insecurity with the Mirai botnet code , 2017, 2017 IEEE 7th Annual Computing and Communication Workshop and Conference (CCWC).

[35]  Yi Zhou,et al.  Understanding the Mirai Botnet , 2017, USENIX Security Symposium.

[36]  Yunhao Liu,et al.  Understanding Fileless Attacks on Linux-based IoT Devices with HoneyCloud , 2019, MobiSys.

[37]  Sanford Weisberg,et al.  An R Companion to Applied Regression , 2010 .

[38]  Yuval Elovici,et al.  SIPHON: Towards Scalable High-Interaction Physical Honeypots , 2017, CPSS@AsiaCCS.

[39]  Kalyan Veeramachaneni,et al.  AI^2: Training a Big Data Machine to Defend , 2016, 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS).

[40]  Nick Nikiforakis,et al.  Picky Attackers: Quantifying the Role of System Properties on Intruder Behavior , 2017, ACSAC.