Internet traffic classification demystified: myths, caveats, and the best practices

Recent research on Internet traffic classification algorithms has yield a flurry of proposed approaches for distinguishing types of traffic, but no systematic comparison of the various algorithms. This fragmented approach to traffic classification research leaves the operational community with no basis for consensus on what approach to use when, and how to interpret results. In this work we critically revisit traffic classification by conducting a thorough evaluation of three classification approaches, based on transport layer ports, host behavior, and flow features. A strength of our work is the broad range of data against which we test the three classification approaches: seven traces with payload collected in Japan, Korea, and the US. The diverse geographic locations, link characteristics and application traffic mix in these data allowed us to evaluate the approaches under a wide variety of conditions. We analyze the advantages and limitations of each approach, evaluate methods to overcome the limitations, and extract insights and recommendations for both the study and practical application of traffic classification. We make our software, classifiers, and data available for researchers interested in validating or extending this work.

[1]  Maurizio Dusi,et al.  Traffic classification through simple statistical fingerprinting , 2007, CCRV.

[2]  Mark Crovella,et al.  Mining anomalies using traffic feature distributions , 2005, SIGCOMM '05.

[3]  Carey L. Williamson,et al.  Offline/realtime traffic classification using semi-supervised learning , 2007, Perform. Evaluation.

[4]  Graham Leedham,et al.  Extraction and analysis of forensic document examiner features used for writer identification , 2007, Pattern Recognit..

[5]  Marc E. Fiuczynski PlanetLab: overview, history, and future directions , 2006, OPSR.

[6]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[7]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[8]  George Varghese,et al.  Network monitoring using traffic dispersion graphs (tdgs) , 2007, IMC '07.

[9]  James Won-Ki Hong,et al.  A Hybrid Approach for Accurate Application Traffic Identification , 2006, 2006 4th IEEE/IFIP Workshop on End-to-End Monitoring Techniques and Services.

[10]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[11]  Kun-Chan Lan,et al.  A measurement study of correlations of Internet flow characteristics , 2006, Comput. Networks.

[12]  Patrick Haffner,et al.  ACAS: automated construction of application signatures , 2005, MineNet '05.

[13]  Anja Feldmann,et al.  Dynamic Application-Layer Protocol Analysis for Network Intrusion Detection , 2006, USENIX Security Symposium.

[14]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[15]  Anirban Mahanti,et al.  Byte me: a case for byte accuracy in traffic classification , 2007, MineNet '07.

[16]  Ian H. Witten,et al.  Weka: Practical machine learning tools and techniques with Java implementations , 1999 .

[17]  Sebastian Zander,et al.  Internet Archeology: Estimating Individual Application Trends in Incomplete Historic Traffic Traces , 2006 .

[18]  Stefan Savage,et al.  Unexpected means of protocol inference , 2006, IMC '06.

[19]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[20]  George C. Polyzos,et al.  A Parameterizable Methodology for Internet Traffic Flow Profiling , 1995, IEEE J. Sel. Areas Commun..

[21]  Michalis Faloutsos,et al.  Transport layer identification of P2P traffic , 2004, IMC '04.

[22]  Kristin P. Bennett,et al.  Support vector machines: hype or hallelujah? , 2000, SKDD.

[23]  Taesang Choi,et al.  Content-aware Internet application traffic measurement and analysis , 2004, 2004 IEEE/IFIP Network Operations and Management Symposium (IEEE Cat. No.04CH37507).

[24]  Grenville J. Armitage,et al.  Training on multiple sub-flows to optimise the use of Machine Learning classifiers in real-world IP networks , 2006, Proceedings. 2006 31st IEEE Conference on Local Computer Networks.

[25]  Matthew Roughan,et al.  Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification , 2004, IMC '04.

[26]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[27]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[28]  Vern Paxson,et al.  A brief history of scanning , 2007, IMC '07.

[29]  Carey L. Williamson,et al.  Identifying and discriminating between web and peer-to-peer traffic in the network core , 2007, WWW '07.

[30]  Jeffrey Erman,et al.  Internet Traffic Identification using Machine Learning , 2006 .

[31]  Michelangelo Ceci,et al.  Redundant feature elimination for multi-class problems , 2004, ICML.

[32]  Sebastian Zander,et al.  Evaluating machine learning algorithms for automated network application identification , 2006 .

[33]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[34]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[35]  Paulo Salvador,et al.  Detecting Internet Applications using Neural Networks , 2006, International conference on Networking and Services (ICNS'06).

[36]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[37]  Xiaohong Guan,et al.  Accurate Classification of the Internet Traffic Based on the SVM Method , 2007, 2007 IEEE International Conference on Communications.

[38]  Ian Witten,et al.  Data Mining , 2000 .

[39]  Renata Teixeira,et al.  Early application identification , 2006, CoNEXT '06.

[40]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[41]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[42]  Konstantina Papagiannaki,et al.  Toward the Accurate Identification of Network Applications , 2005, PAM.