A survey of techniques for internet traffic classification using machine learning

The research community has begun looking for IP traffic classification techniques that do not rely on `well known¿ TCP or UDP port numbers, or interpreting the contents of packet payloads. New work is emerging on the use of statistical traffic characteristics to assist in the identification and classification process. This survey paper looks at emerging research into the application of Machine Learning (ML) techniques to IP traffic classification - an inter-disciplinary blend of IP networking and data mining techniques. We provide context and motivation for the application of ML techniques to IP traffic classification, and review 18 significant works that cover the dominant period from 2004 to early 2007. These works are categorized and reviewed according to their choice of ML strategies and primary contributions to the literature. We also discuss a number of key requirements for the employment of ML-based traffic classifiers in operational IP networks, and qualitatively critique the extent to which the reviewed works meet these requirements. Open issues and challenges in the field are also discussed.

[1]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[2]  Grenville Armitage,et al.  Synthetic sub-flow pairs for timely and stable IP traffic identification , 2006 .

[3]  C. Hauser,et al.  Beyond technology: the missing pieces for QoS success , 2003, RIPQoS '03.

[4]  S. Zander,et al.  An Architecture for Automated Network Control of QoS over Consumer Broadband Links , 2005, TENCON 2005 - 2005 IEEE Region 10 Conference.

[5]  Vern Paxson,et al.  Empirically derived analytic models of wide-area TCP connections , 1994, TNET.

[6]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[7]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[8]  Fred Baker,et al.  Cisco Architecture for Lawful Intercept in IP Networks , 2004, RFC.

[9]  Michalis Vazirgiannis,et al.  Cluster validity methods: part I , 2002, SGMD.

[10]  Grenville J. Armitage,et al.  A synthetic traffic model for Quake3 , 2004, ACE '04.

[11]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[12]  Carey L. Williamson,et al.  Identifying and discriminating between web and peer-to-peer traffic in the network core , 2007, WWW '07.

[13]  C.-C. Jay Kuo,et al.  Internet Traffic Classification for Scalable QOS Provision , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[14]  Konstantina Papagiannaki,et al.  Toward the Accurate Identification of Network Applications , 2005, PAM.

[15]  Henning Schulzrinne,et al.  RTP: A Transport Protocol for Real-Time Applications , 1996, RFC.

[16]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[17]  Anja Feldmann,et al.  An analysis of Internet chat systems , 2003, IMC '03.

[18]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[19]  Patrick Haffner,et al.  ACAS: automated construction of application signatures , 2005, MineNet '05.

[20]  Anirban Mahanti,et al.  Byte me: a case for byte accuracy in traffic classification , 2007, MineNet '07.

[21]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[22]  Matthew Roughan,et al.  Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification , 2004, IMC '04.

[23]  Patrick Henry Winston,et al.  Artificial intelligence (2nd ed.) , 1984 .

[24]  Grenville Armitage,et al.  A synthetic traffic model for Half-Life , 2003 .

[25]  David L. Black,et al.  An Architecture for Differentiated Service , 1998 .

[26]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[27]  C.-C. Jay Kuo,et al.  GA-Based Internet Traffic Classification Technique for QoS Provisioning , 2006, 2006 International Conference on Intelligent Information Hiding and Multimedia.

[28]  Zhongzhi Shi Principles of machine learning , 1992 .

[29]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[30]  Sebastian Zander,et al.  ANGEL - Automated Network Games Enhancement Layer , 2006 .

[31]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[32]  Scott Shenker,et al.  Integrated Services in the Internet Architecture : an Overview Status of this Memo , 1994 .

[33]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[34]  Stefan Savage,et al.  Unexpected means of protocol inference , 2006, IMC '06.

[35]  Steven J. Fenves,et al.  The formation and use of abstract concepts in design , 1991 .

[36]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[37]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[38]  Zheng Wang,et al.  An Architecture for Differentiated Services , 1998, RFC.

[39]  Marco Mellia,et al.  Revealing skype traffic: when randomness plays with you , 2007, SIGCOMM 2007.

[40]  Sebastian Zander,et al.  Evaluating machine learning methods for online game traffic identification , 2006 .

[41]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[42]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[43]  Martial Hebert,et al.  Shape-based recognition of wiry objects , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  M. Pazzani,et al.  Concept formation knowledge and experience in unsupervised learning , 1991 .

[45]  Grenville Armitage,et al.  Quality of Service in IP Networks , 2000 .

[46]  Kimberly Claffy,et al.  Internet traffic characterization , 1994 .

[47]  Bernard Silver Netman: a learning network traffic controller , 1990, IEA/AIE '90.

[48]  Carey L. Williamson,et al.  A Longitudinal Study of P2P Traffic Classification , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.

[49]  Carey L. Williamson,et al.  Categories and Subject Descriptors: C.4 [Computer Systems Organization]Performance of Systems , 2022 .

[50]  Grenville J. Armitage,et al.  Training on multiple sub-flows to optimise the use of Machine Learning classifiers in real-world IP networks , 2006, Proceedings. 2006 31st IEEE Conference on Local Computer Networks.

[51]  Michalis Vazirgiannis,et al.  Clustering validity checking methods: part II , 2002, SGMD.

[52]  Michalis Faloutsos,et al.  Is P2P dying or just hiding? [P2P traffic measurement] , 2004, IEEE Global Telecommunications Conference, 2004. GLOBECOM '04..

[53]  Ian Witten,et al.  Data Mining , 2000 .

[54]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[55]  J. Erman,et al.  QRP05-4: Internet Traffic Identification using Machine Learning , 2006, IEEE Globecom 2006.

[56]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[57]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[58]  Maurizio Dusi,et al.  Traffic classification through simple statistical fingerprinting , 2007, CCRV.

[59]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[60]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[61]  Herbert A. Simon,et al.  WHY SHOULD MACHINES LEARN , 1983 .