Classifying SSH encrypted traffic with minimum packet header features using genetic programming

The classification of Encrypted Traffic, namely Secure Shell (SSH), on the fly from network TCP traffic represents a particularly challenging application domain for machine learning. Solutions should ideally be both simple - therefore efficient to deploy - and accurate. Recent advances to teambased Genetic Programming provide the opportunity to decompose the original problem into a subset of classifiers with non-overlapping behaviors, in effect providing further insight into the problem domain and increasing the throughput of solutions. Thus, in this work we have investigated the identification of SSH encrypted traffic based on packet header features without using IP addresses, port numbers and payload data. Evaluation of C4.5 and AdaBoost - representing current best practice - against the Symbiotic Bid-based (SBB) paradigm of team-based Genetic Programming (GP) under data sets common and independent from the training condition indicates that SBB based GP solutions are capable of providing simpler solutions without sacrificing accuracy.

[1]  Richard K. Belew,et al.  New Methods for Competitive Coevolution , 1997, Evolutionary Computation.

[2]  Edwin D. de Jong,et al.  A Monotonic Archive for Pareto-Coevolution , 2007, Evolutionary Computation.

[3]  Philip K. Chan,et al.  An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection , 2003, RAID.

[4]  Malcolm I. Heywood,et al.  Coevolutionary bid-based genetic programming for problem decomposition in classification , 2008, Genetic Programming and Evolvable Machines.

[5]  Yin Zhang,et al.  Detecting Backdoors , 2000, USENIX Security Symposium.

[6]  Charles V. Wright,et al.  On Inferring Application Protocol Behaviors in Encrypted Network Traffic , 2006, J. Mach. Learn. Res..

[7]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..

[8]  Andrew R. McIntyre,et al.  Cooperative Problem Decomposition in Pareto Competitive Classifier Models of Coevolution , 2008, EuroGP.

[9]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[10]  Patrick Haffner,et al.  ACAS: automated construction of application signatures , 2005, MineNet '05.

[11]  Riyad Alshammari,et al.  Investigating Two Different Approaches for Encrypted Traffic Classification , 2008, 2008 Sixth Annual Conference on Privacy, Security and Trust.

[12]  Konstantina Papagiannaki,et al.  Toward the Accurate Identification of Network Applications , 2005, PAM.

[13]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[14]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[15]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[16]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[17]  Malcolm I. Heywood,et al.  GP Classification under Imbalanced Data sets: Active Sub-sampling and AUC Approximation , 2008, EuroGP.

[18]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[19]  Riyad Alshammari,et al.  A Preliminary Performance Comparison of Two Feature Sets for Encrypted Traffic Classification , 2008, CISIS.

[20]  Malcolm I. Heywood,et al.  Managing team-based problem solving with symbiotic bid-based genetic programming , 2008, GECCO '08.

[21]  Charles V. Wright,et al.  HMM profiles for network traffic classification , 2004, VizSEC/DMSEC '04.