MADMAX: Browser-Based Malicious Domain Detection Through Extreme Learning Machine

Fast and accurate malicious domain detection is an essential research theme to prevent cybercrime, and machine learning is an attractive approach for detecting unseen malicious domains in the past decade. In this paper, we present MADMAX (MAchine learning-baseD MAlicious domain eXhauster), a browser-based application leveraging extreme learning machine (ELM) for malicious domain detection. In contrast to the existing work of ELM-based domain detection, MADMAX newly introduces two methods, i.e., selection of optimized features to provide higher accuracy and throughput based on permutation importance and real-time training to retrain a model with an updated malicious dataset for continuous malicious domain detection. We demonstrate that MADMAX fairly outperforms the existing work with respect to accuracy and throughput by virtue of the selection of optimized features. Moreover, we also confirm a model with real-time training stably detects even unseen malicious domains, whereas accuracy of a model without the real-time training decreases due to the unseen domains. The source codes of MADMAX is publicly available via GitHub.

[1]  Hyrum S. Anderson,et al.  Predicting Domain Generation Algorithms with Long Short-Term Memory Networks , 2016, ArXiv.

[2]  Leyla Bilge,et al.  Exposure: A Passive DNS Analysis Service to Detect and Report Malicious Domains , 2014, TSEC.

[3]  Heejo Lee,et al.  Identifying botnets by capturing group activities in DNS traffic , 2012, Comput. Networks.

[4]  Nick Feamster,et al.  Building a Dynamic Reputation System for DNS , 2010, USENIX Security Symposium.

[5]  Wouter Joosen,et al.  Tranco: A Research-Oriented Top Sites Ranking Hardened Against Manipulation , 2018, NDSS.

[6]  Ting Yu,et al.  Discovering Malicious Domains through Passive DNS Data Graph Analysis , 2016, AsiaCCS.

[7]  Sandeep Yadav,et al.  Detecting Malicious Domains via Graph Inference , 2014, AISec '14.

[8]  Alejandro Correa Bahnsen,et al.  Hunting Malicious TLS Certificates with Deep Neural Networks , 2018, AISec@CCS.

[9]  Rachel Greenstadt,et al.  PhishZoo: Detecting Phishing Websites by Looking at Them , 2011, 2011 IEEE Fifth International Conference on Semantic Computing.

[10]  Xizhao Wang,et al.  A review on neural networks with random weights , 2018, Neurocomputing.

[11]  Zhong Ming,et al.  Some Tricks in Parameter Selection for Extreme Learning Machine , 2017 .

[12]  Martine De Cock,et al.  Character Level based Detection of DGA Domain Names , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[13]  Ehud Gudes,et al.  A Topology Based Flow Model for Computing Domain Reputation , 2015, DBSec.

[14]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[15]  Xiaotie Deng,et al.  Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Mover's Distance (EMD) , 2006, IEEE Transactions on Dependable and Secure Computing.

[16]  Jie Zhang,et al.  Residual compensation extreme learning machine for regression , 2018, Neurocomputing.

[17]  Xizhao Wang,et al.  An Initial Study on the Relationship Between Meta Features of Dataset and the Initialization of NNRW , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[18]  Yong Shi,et al.  Malicious Domain Name Detection Based on Extreme Machine Learning , 2017, Neural Processing Letters.

[19]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[20]  Guang-Bin Huang,et al.  Trends in extreme learning machines: A review , 2015, Neural Networks.

[21]  Sherali Zeadally,et al.  A Taxonomy of Domain-Generation Algorithms , 2016, IEEE Security & Privacy.

[22]  Weidong Yang,et al.  Class-specific cost regulation extreme learning machine for imbalanced classification , 2017, Neurocomputing.

[23]  Babak Rahbarinia,et al.  Segugio: Efficient Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[24]  Jiahai Yang,et al.  HinDom: A Robust Malicious Domain Detection System based on Heterogeneous Information Network with Transductive Classification , 2019, RAID.

[25]  Martine De Cock,et al.  Inline DGA Detection with Deep Networks , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[26]  Mikel Izal,et al.  Botnet detection based on DNS records and active probing , 2011, Proceedings of the International Conference on Security and Cryptography.

[27]  William Stafford Noble,et al.  Support vector machine , 2013 .

[28]  Narasimhan Sundararajan,et al.  A Fast and Accurate Online Sequential Learning Algorithm for Feedforward Networks , 2006, IEEE Transactions on Neural Networks.

[29]  Ting Yu,et al.  A Survey on Malicious Domains Detection through DNS Data Analysis , 2018, ACM Comput. Surv..

[30]  Zhenkai Liang,et al.  Phishing-Alarm: Robust and Efficient Phishing Detection via Page Component Similarity , 2017, IEEE Access.

[31]  Lorenzo Cavallaro,et al.  TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time , 2018, USENIX Security Symposium.

[32]  Wendong Xiao,et al.  Non-iterative and Fast Deep Learning: Multilayer Extreme Learning Machines , 2020, J. Frankl. Inst..

[33]  Konstantin Berlin,et al.  eXpose: A Character-Level Convolutional Neural Network with Embeddings For Detecting Malicious URLs, File Paths and Registry Keys , 2017, ArXiv.

[34]  Blake Anderson,et al.  Identifying Encrypted Malware Traffic with Contextual Flow Data , 2016, AISec@CCS.

[35]  Yanjiao Li,et al.  Robust extreme learning machine for modeling with unknown noise , 2020, J. Frankl. Inst..

[36]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[37]  Haigang Zhang,et al.  Online sequential ELM algorithm with forgetting factor for real applications , 2017, Neurocomputing.

[38]  Dong Sun Park,et al.  Online sequential extreme learning machine with forgetting mechanism , 2012, Neurocomputing.

[39]  Leyla Bilge,et al.  EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis , 2011, NDSS.

[40]  Syed Taqi Ali,et al.  A Computer Vision Technique to Detect Phishing Attacks , 2015, 2015 Fifth International Conference on Communication Systems and Network Technologies.

[41]  Kuan-Ta Chen,et al.  Fighting Phishing with Discriminative Keypoint Features , 2009, IEEE Internet Computing.

[42]  Jiahai Yang,et al.  HGDom: Heterogeneous Graph Convolutional Networks for Malicious Domain Detection , 2020, NOMS 2020 - 2020 IEEE/IFIP Network Operations and Management Symposium.

[43]  Heejo Lee,et al.  Botnet Detection by Monitoring Group Activities in DNS Traffic , 2007, 7th IEEE International Conference on Computer and Information Technology (CIT 2007).

[44]  Kuan-Ta Chen,et al.  Counteracting Phishing Page Polymorphism: An Image Layout Analysis Approach , 2009, ISA.

[45]  Mario Fritz,et al.  VisualPhishNet: Zero-Day Phishing Website Detection by Visual Similarity , 2020, CCS.

[46]  Mitsuaki Akiyama,et al.  DomainProfiler: Discovering Domain Names Abused in Future , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[47]  Soroush Vosoughi,et al.  Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder , 2016, SIGIR.

[48]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[49]  Arun Kumar Sangaiah,et al.  DGA Domain Name Classification Method Based on Long Short-Term Memory with Attention Mechanism , 2019, Applied Sciences.

[50]  Babak Rahbarinia,et al.  Efficient and Accurate Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks , 2016, ACM Trans. Priv. Secur..

[51]  Heejo Lee,et al.  BotGAD: detecting botnets by capturing group activities in network traffic , 2009, COMSWARE '09.

[52]  Zhong Zhou,et al.  Tweet2Vec: Character-Based Distributed Representations for Social Media , 2016, ACL.

[53]  Zhong Ming,et al.  Impact of Probability Distribution Selection on RVFL Performance , 2017, SmartCom.

[54]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[55]  Jens Myrup Pedersen,et al.  On the ground truth problem of malicious DNS traffic analysis , 2015, Comput. Secur..

[56]  Zhou Li,et al.  Detection of Early-Stage Enterprise Infection by Mining Large-Scale Log Data , 2014, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[57]  Chun-Ying Huang,et al.  Mitigate web phishing using site signatures , 2010, TENCON 2010 - 2010 IEEE Region 10 Conference.

[58]  Daniel S. Berman,et al.  DGA CapsNet: 1D Application of Capsule Networks to DGA Detection , 2019, Inf..

[59]  Wenke Lee,et al.  Detecting Malware Domains at the Upper DNS Hierarchy , 2011, USENIX Security Symposium.

[60]  Hong Zhao,et al.  Malicious Domain Names Detection Algorithm Based on N-Gram , 2019, J. Comput. Networks Commun..

[61]  Ebru Akcapinar Sezer,et al.  Use of HOG descriptors in phishing detection , 2016, 2016 4th International Symposium on Digital Forensic and Security (ISDFS).

[62]  Srdjan Capkun,et al.  Detecting Mobile Application Spoofing Attacks by Leveraging User Visual Similarity Perception , 2017, IACR Cryptol. ePrint Arch..

[63]  Lorenzo Martignoni,et al.  FluXOR: Detecting and Monitoring Fast-Flux Service Networks , 2008, DIMVA.

[64]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[65]  Yuewei Dai,et al.  Detecting Stealthy Domain Generation Algorithms Using Heterogeneous Deep Neural Network Framework , 2020, IEEE Access.