Efficient application identification and the temporal and spatial stability of classification schema

Motivated by the importance of accurate identification for a range of applications, this paper compares and contrasts the effective and efficient classification of network-based applications using behavioral observations of network-traffic and those using deep-packet inspection. Importantly, throughout our work we are able to make comparison with data possessing an accurate, independently determined ground-truth that describes the actual applications causing the network-traffic observed. In a unique study in both the spatial-domain: comparing across different network-locations and in the temporal-domain: comparing across a number of years of data, we illustrate the decay in classification accuracy across a range of application-classification mechanisms. Further, we document the accuracy of spatial classification without training data possessing spatial diversity. Finally, we illustrate the classification of UDP traffic. We use the same classification approach for both stateful flows (TCP) and stateless flows based upon UDP. Importantly, we demonstrate high levels of accuracy: greater than 92% for the worst circumstance regardless of the application.

[1]  Konstantina Papagiannaki,et al.  Toward the Accurate Identification of Network Applications , 2005, PAM.

[2]  Martin Roesch,et al.  Snort - Lightweight Intrusion Detection for Networks , 1999 .

[3]  Marco Canini,et al.  On the Double-Faced Nature of P2P Traffic , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).

[4]  George Varghese,et al.  Building a better NetFlow , 2004, SIGCOMM.

[5]  Hamza Dahmouni,et al.  A markovian signature-based approach to IP traffic classification , 2007, MineNet '07.

[6]  Helen J. Wang,et al.  Automatically Extracting Fields from Unknown Network Protocols , 2006 .

[7]  Marco Canini,et al.  GTVS: Boosting the Collection of Application Traffic Ground Truth , 2009, TMA.

[8]  Vinod Yegneswaran,et al.  Characteristics of internet background radiation , 2004, IMC '04.

[9]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[10]  David G. Stork,et al.  Pattern Classification , 1973 .

[11]  Panayiotis Mavrommatis,et al.  Identifying Known and Unknown Peer-to-Peer Traffic , 2006, Fifth IEEE International Symposium on Network Computing and Applications (NCA'06).

[12]  Dario Rossi,et al.  Revealing skype traffic: when randomness plays with you , 2007, SIGCOMM '07.

[13]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[14]  Michalis Faloutsos,et al.  Is P2P dying or just hiding? [P2P traffic measurement] , 2004, IEEE Global Telecommunications Conference, 2004. GLOBECOM '04..

[15]  Ian Witten,et al.  Data Mining , 2000 .

[16]  Renata Teixeira,et al.  Early application identification , 2006, CoNEXT '06.

[17]  Andrew W. Moore,et al.  A Machine Learning Approach for Efficient Traffic Classification , 2007, 2007 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[18]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[19]  Renata Teixeira,et al.  Early Recognition of Encrypted Applications , 2007, PAM.

[20]  Matthew Roughan,et al.  Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification , 2004, IMC '04.

[21]  Marco Canini,et al.  Per flow packet sampling for high-speed network monitoring , 2009, 2009 First International Communication Systems and Networks and Workshops.

[22]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[23]  Michalis Faloutsos,et al.  Profiling the End Host , 2007, PAM.

[24]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[25]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[26]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[27]  Wei Li,et al.  Classifying HTTP Traffic in the New Age , 2008, SIGCOMM 2008.

[28]  Maurizio Dusi,et al.  Traffic classification through simple statistical fingerprinting , 2007, CCRV.

[29]  Li Jun,et al.  Internet Traffic Classification Using Machine Learning , 2007, 2007 Second International Conference on Communications and Networking in China.

[30]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[31]  Michalis Faloutsos,et al.  Transport layer identification of P2P traffic , 2004, IMC '04.