Improved DGA Domain Names Detection and Categorization Using Deep Learning Architectures with Classical Machine Learning Algorithms

Recent families of malware have largely adopted domain generation algorithms (DGAs). This is primarily due to the fact that the DGA can generate a large number of domain names after that utilization a little subset for real command and control (C&C) server communication. DNS blacklist based on blacklisting and sink-holing is the most commonly used approach to block DGA C&C traffic. This is a daunting task because the network admin has to continuously update the DNS blacklist to control the constant updating behaviors of DGA. Another significant direction is to predict the domain name as DGA generated by intercepting the DNS queries in DNS traffic. Most of the existing methods are based on identifying groupings based on clustering, statistical properties are estimated for groupings and classification is done using statistical tests. This approach takes larger time-window and moreover can’t be used in real-time DGA domain detection. Additionally, these techniques use passive DNS and NXDomain information. Integration of all these various information charges high-cost and in some case is highly impossible to obtain all these information because of real-time constraints. Detecting DGA on per domain basis is an alternative approach which requires no additional information. The existing methods on detecting DGA per domain basis is based on machine learning. This approach relies on feature engineering which is a time-consuming process and can be easily circumvented by malware authors. In recent days, the application of deep learning is leveraged for DGA detection on per domain basis. This requires no feature engineering and easily can’t be circumvented. In all the existing studies of DGA detection, the deep learning architectures performed well in comparison to the classical machine learning algorithms (CMLAs). Following, in this chapter we propose a deep learning based framework named as I-DGA-DC-Net, which composed of Domain name similarity checker and Domain name statistical analyzer modules. The Domain name similarity checker uses deep learning architecture and compared with the classical string comparison methods. These experiments are run on the publically available data set. Following, the domains which are not detected by similar are passed into statistical analyzer. This takes the raw domain names as input and captures the optimal features implicitly by passing into character level embedding followed by deep learning layers and classify them using the CMLAs. Moreover, the effectiveness of the CMLAs are studied for categorizing algorithmically generated malware to its corresponding malware family over fully connected layer with \(\textit{softmax}\) non-linear activation function using AmritaDGA data set. All experiments related deep learning architectures are run till 100 epochs with learning rate 0.01. The experiments with deep learning architectures-CMLs showed highest test accuracy in comparison to deep learning architectures-\(\textit{softmax}\) model. This is due to the reason that the deep learning architectures are good at obtaining high level features and SVM good at constructing decision surfaces from optimal features. SVM generally can’t learn complicated abstract and invariant features whereas the hidden layers in deep learning architectures facilitate to capture them.

[1]  K. P. Soman,et al.  Evaluating deep learning approaches to characterize and classify malicious URL's , 2018, J. Intell. Fuzzy Syst..

[2]  K. P. Soman,et al.  Applying deep learning approaches for network traffic prediction , 2017, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[3]  Christopher Krügel,et al.  Analysis of a Botnet Takeover , 2011, IEEE Security & Privacy.

[4]  Martine De Cock,et al.  Character Level based Detection of DGA Domain Names , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[5]  Christian Rossow,et al.  RUHR-UNIVERSITÄT BOCHUM , 2014 .

[6]  K. P. Soman,et al.  Evaluating effectiveness of shallow and deep networks to intrusion detection system , 2017, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[7]  K. P. Soman,et al.  DeepAnti-PhishNet: Applying deep neural networks for phishing email detection CEN-AISecurity@IWSPA-2018 , 2018 .

[8]  K. P. Soman,et al.  Evaluating shallow and deep networks for ransomware detection and classification , 2017, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[9]  K. P. Soman,et al.  Evaluating shallow and deep networks for secure shell (ssh)traffic analysis , 2017, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[10]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  P SomanK.,et al.  S.P.O.O.F Net: Syntactic Patterns for identification of Ominous Online Factors , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[12]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[14]  K. P. Soman,et al.  Long short-term memory based operation log anomaly detection , 2017, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[15]  Hai Anh Tran,et al.  A LSTM based framework for handling multiclass imbalance in DGA botnet detection , 2018, Neurocomputing.

[16]  K. P. Soman,et al.  Deep encrypted text categorization , 2017, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[17]  Hyrum S. Anderson,et al.  Detecting Homoglyph Attacks with a Siamese Neural Network , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[18]  Kang G. Shin,et al.  Good guys vs. Bot Guise: Mimicry attacks against fast-flux detection systems , 2011, 2011 Proceedings IEEE INFOCOM.

[19]  Mohamed Elhoseny,et al.  Self-maintenance model for Wireless Sensor Networks , 2017, Comput. Electr. Eng..

[20]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[21]  Prabaharan Poornachandran,et al.  Scalable Framework for Cyber Threat Situational Awareness Based on Domain Name Systems Data Analysis , 2018 .

[22]  K. P. Soman,et al.  From Vector Space Models to Vector Space Models of Semantics , 2016, FIRE Workshop.

[23]  K. P. Soman,et al.  Evaluating deep learning approaches to characterize and classify the DGAs at scale , 2018, J. Intell. Fuzzy Syst..

[24]  K. P. Soman,et al.  Evaluation of Recurrent Neural Network and its Variants for Intrusion Detection System (IDS) , 2017, Int. J. Inf. Syst. Model. Des..

[25]  Mohamed Elhoseny,et al.  Feature selection based on artificial bee colony and gradient boosting decision tree , 2019, Appl. Soft Comput..

[26]  Luca Maria Gambardella,et al.  Convolutional Neural Support Vector Machines: Hybrid Visual Pattern Classifiers for Multi-robot Systems , 2012, 2012 11th International Conference on Machine Learning and Applications.

[27]  Yann LeCun,et al.  Large-scale Learning with SVM and Convolutional for Generic Object Categorization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[29]  Hyrum S. Anderson,et al.  DeepDGA: Adversarially-Tuned Domain Generation and Detection , 2016, AISec@CCS.

[30]  K. P. Soman,et al.  Secure shell (ssh) traffic analysis with flow based features using shallow and deep networks , 2017, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[31]  Mohamed Elhoseny,et al.  A Framework for Big Data Analysis in Smart Cities , 2018, AMLTA.

[32]  Prabaharan Poornachandran,et al.  ScaleNet: Scalable and Hybrid Frameworkfor Cyber Threat Situational AwarenessBased on DNS, URL, and Email Data Analysis , 2019, J. Cyber Secur. Mobil..

[33]  K. P. Soman,et al.  Deep android malware detection and classification , 2017, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[34]  Mohamed Elhoseny,et al.  Dynamic Wireless Sensor Networks , 2019, Studies in Systems, Decision and Control.

[35]  K. P. Soman,et al.  Detecting Android malware using Long Short-term Memory (LSTM) , 2018, J. Intell. Fuzzy Syst..

[36]  Ryan R. Curtin,et al.  Detecting DGA domains with recurrent neural networks and side information , 2018, ARES.

[37]  K. P. Soman,et al.  Detecting malicious domain names using deep learning approaches at scale , 2018, J. Intell. Fuzzy Syst..

[38]  R. Vinayakumar,et al.  DeepMalNet: Evaluating shallow and deep networks for static PE malware detection , 2018, ICT Express.

[39]  Martine De Cock,et al.  Inline DGA Detection with Deep Networks , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[40]  K. P. Soman,et al.  Applying convolutional neural network for network intrusion detection , 2017, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[41]  John McHugh,et al.  Crossing the threshold: Detecting network malfeasance via sequential hypothesis testing , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).