Edge Computing Intelligence Using Robust Feature Selection for Network Traffic Classification in Internet-of-Things

Internet-of-Things (IoT) devices are massively interconnected, which generates a massive amount of network traffic. The concept of edge computing brings a new paradigm to monitor and manage network traffic at the network’s edge. Network traffic classification is a critical task to monitor and identify Internet traffic. Recent traffic classification works suggested using statistical flow features to classify network traffic accurately using machine learning techniques. The selected classification features must be stable and can work across different spatial and temporal heterogeneity. This paper proposes a feature selection mechanism called Ensemble Weight Approach (EWA) for selecting significant features for Internet traffic classification based on multi-criterion ranking and selection mechanisms. Extensive simulations have been conducted using publicly-available traces from the University of Cambridge. The simulation results demonstrate that EWA is capable of identifying stable features subset for Internet traffic identification. EWA-selected features improve the mean accuracy up to 1.3% and reduce RMSE using fewer features than other feature selection methods. The smaller number of features directly contributes to shorter classification time. Furthermore, the selected features can train stable traffic classification generative models irrespective of the dataset’s spatial and temporal differences, with consistent accuracy up to 97%. The overall performance indicates that EWA-selected statistical flow features can improve the overall traffic classification.

[1]  Qi Li,et al.  Network traffic classification via HMM under the guidance of syntactic structure , 2012, Comput. Networks.

[2]  Huiqing Liu,et al.  A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. , 2002, Genome informatics. International Conference on Genome Informatics.

[3]  Qing Zhang,et al.  A two-stage feature selection and intelligent fault diagnosis method for rotating machinery using hybrid filter and wrapper method , 2018, Neurocomputing.

[4]  Zihui Ge,et al.  Lightweight application classification for network management , 2007, INM '07.

[5]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[6]  Huan Liu,et al.  Feature Selection via Discretization , 1997, IEEE Trans. Knowl. Data Eng..

[7]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[8]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[9]  Huan Liu,et al.  Feature Selection Strategy in Text Classification , 2011, PAKDD.

[10]  A Al Harthi,et al.  Designing an accurate and efficient classification approach for network traffic monitoring , 2015 .

[11]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[12]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[13]  Yang Liu,et al.  Solving P2P Traffic Identification Problems Via Optimized Support Vector Machines , 2007, 2007 IEEE/ACS International Conference on Computer Systems and Applications.

[14]  Sai Wang,et al.  A Hybrid Genetic Algorithm With Wrapper-Embedded Approaches for Feature Selection , 2018, IEEE Access.

[15]  Giorgio Giacinto,et al.  A Modular Architecture for the Analysis of HTTP Payloads Based on Multiple Classifiers , 2011, MCS.

[16]  Yiming Yang,et al.  High-performing feature selection for text classification , 2002, CIKM '02.

[17]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[19]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[20]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[21]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[22]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[23]  Hongfang Liu,et al.  Identifying significant genes from microarray data , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[24]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[25]  Xiaohong Guan,et al.  An SVM-based machine learning method for accurate internet traffic classification , 2010, Inf. Syst. Frontiers.

[26]  Gang Lu,et al.  Feature selection for optimizing traffic classification , 2012, Comput. Commun..

[27]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[28]  David Moore,et al.  The CoralReef Software Suite as a Tool for System and Network Administrators , 2001, LISA.

[29]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[30]  Zahir Tari,et al.  An optimal and stable feature selection approach for traffic classification based on multi-criterion fusion , 2014, Future Gener. Comput. Syst..

[31]  Ismahani Ismail,et al.  Impact of Packet Inter-arrival Time Features for Online Peer-to-Peer (P2P) Classification , 2018, International Journal of Electrical and Computer Engineering (IJECE).

[32]  Domenico Ciuonzo,et al.  A Dive into the Dark Web: Hierarchical Traffic Classification of Anonymity Tools , 2020, IEEE Transactions on Network Science and Engineering.

[33]  Haleh Vafaie,et al.  Feature Selection Methods: Genetic Algorithms vs. Greedy-like Search , 2009 .

[34]  Konstantina Papagiannaki,et al.  Toward the Accurate Identification of Network Applications , 2005, PAM.

[35]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[36]  Muhammad N. Marsono,et al.  Online network traffic classification with incremental learning , 2016, Evol. Syst..

[37]  Huan Liu,et al.  Feature selection for clustering - a filter solution , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[38]  Jun Zhang,et al.  Network Traffic Classification Using Correlation Information , 2013, IEEE Transactions on Parallel and Distributed Systems.

[39]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  T. S. Chou,et al.  Network Intrusion Detection Design Using Feature Selection of Soft Computing Paradigms , 2008 .

[41]  Ying Liu,et al.  A Comparative Study on Feature Selection Methods for Drug Discovery , 2004, J. Chem. Inf. Model..

[42]  Hu Yue Automated mining of packet signatures for traffic identification at application layer with apriori algorithm , 2008 .

[43]  Zhengding Qiu,et al.  Identification peer-to-peer traffic for high speed networks using packet sampling and application signatures , 2008, 2008 9th International Conference on Signal Processing.

[44]  Ainuddin Wahid Abdul Wahab,et al.  Feature Selection of Denial-of-Service Attacks Using Entropy and Granular Computing , 2018 .

[45]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[46]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[47]  Nathalie Japkowicz,et al.  A Feature Selection and Evaluation Scheme for Computer Virus Detection , 2006, Sixth International Conference on Data Mining (ICDM'06).

[48]  Foster J. Provost,et al.  Scaling Up Inductive Algorithms: An Overview , 1997, KDD.

[49]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[50]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[51]  Ian Goldberg,et al.  Enhancing Tor's performance using real-time traffic classification , 2012, CCS.

[52]  Muhammad Nadzir Marsono,et al.  Multi-stage Feature Selection for On-Line Flow Peer-to-Peer Traffic Identification , 2017, AsiaSim 2017.

[53]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[54]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[55]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[56]  Mosab Hamdan,et al.  Online P2P Internet Traffic Classification and Mitigation Based on Snort and ML , 2019 .

[57]  Andrew W. Moore,et al.  Discriminators for use in flow-based classification , 2013 .

[58]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[59]  Muhammad N. Marsono,et al.  Online NetFPGA decision tree statistical traffic classifier , 2013, Comput. Commun..

[60]  Sulaiman Mohd Nor,et al.  Selection of On-line Features for Peer-to-Peer Network Traffic Classification , 2013, ISI.