A Hybrid Classifier with a Binning Method for Network Application Identification

Despite the increasing interest in application identification, the traditional approach based on transport layer port numbers has become less effective due to several reasons including the increasing use of random or non-standard port numbers and tunneling (e.g., HTTP tunnels). One approach to overcome this is to inspect application payload information. While highly accurate, it is limited and complicated for encrypted or obfuscated packets. Another common approach is to utilize flow statistics, such as flow size and duration, for classifying applications. Since it does not require to read packet contents, this approach has no limitation to plain-text flows, but it is known to be relatively less accurate. In this work, we develop a framework that incorporates those multiple classification techniques to offer accurate identification of applications with greater flexibility. In particular, we present our design of the hybrid classifier that performs classification based on machine learning with payload information and statistical flow-level features. With a recently collected traffic data set with a diverse set of applications, our experimental results show that our hybrid approach provides a high degree of accuracy for application identification yielding an accuracy of 95% on average. In addition, we propose an optimization technique with a novel binning method that partitions the given application set into multiple subgroups to improve the overall identification accuracy.

[1]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[2]  Patrick Haffner,et al.  ACAS: automated construction of application signatures , 2005, MineNet '05.

[3]  Michalis Faloutsos,et al.  Transport layer identification of P2P traffic , 2004, IMC '04.

[4]  Jesse Alama,et al.  Premise Selection for Mathematics by Corpus Analysis and Kernel Methods , 2011, Journal of Automated Reasoning.

[5]  Janet Franklin,et al.  Mapping land-cover modifications over large areas: A comparison of machine learning algorithms , 2008 .

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[8]  Joaquín Abellán,et al.  Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring , 2014, Expert Syst. Appl..

[9]  Ke Xu,et al.  AutoSig-Automatically Generating Signatures for Applications , 2009, 2009 Ninth IEEE International Conference on Computer and Information Technology.

[10]  Béla Hullár,et al.  Early Identification of Peer-to-Peer Traffic , 2011, 2011 IEEE International Conference on Communications (ICC).

[11]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[12]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[13]  Michalis Faloutsos,et al.  SubFlow: Towards practical flow-level traffic classification , 2012, 2012 Proceedings IEEE INFOCOM.

[14]  Marcin Pietrzyk Hybrid traffic identification , 2010 .

[15]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[16]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[17]  James Won-Ki Hong,et al.  Towards automated application signature generation for traffic identification , 2008, NOMS 2008 - 2008 IEEE Network Operations and Management Symposium.

[18]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[19]  Elena Baralis,et al.  Hierarchical learning for fine grained internet traffic classification , 2012, 2012 8th International Wireless Communications and Mobile Computing Conference (IWCMC).

[20]  Satoshi Ohzahata,et al.  A Traffic Identification Method and Evaluations for a Pure P2P Application , 2005, PAM.

[21]  Stefan Savage,et al.  Unexpected means of protocol inference , 2006, IMC '06.

[22]  Guillaume Urvoy-Keller,et al.  Application-based feature selection for Internet traffic classification , 2010, 2010 22nd International Teletraffic Congress (lTC 22).

[23]  Renata Teixeira,et al.  Early application identification , 2006, CoNEXT '06.

[24]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[25]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[26]  Erik Hjelmvik,et al.  Statistical Protocol IDentification with SPID: Preliminary Results , 2009 .