SANTaClass: A Self Adaptive Network Traffic Classification system

A critical aspect of network management from an operator's perspective is the ability to understand or classify all traffic that traverses the network. The failure of port based traffic classification technique triggered an interest in discovering signatures based on packet content. However, this approach involves manually reverse engineering all the applications/protocols that need to be identified. This suffers from the problem of scalability; keeping up with the new applications that come up everyday is very challenging and time-consuming. Moreover, traditional approach of developing signatures once and using them in different networks suffers from low coverage. In this work, we present a novel fully automated packet payload content (PPC) based network traffic classification system that addresses the above shortcomings. Our system learns new application signatures in the network where classification is desired. Further more, our system adapts the signatures as the traffic for an application changes. Based on real traces from several service providers, we show that our system is capable of detecting (1) tunneled or wrapped applications, (2) applications that use random ports, and (3) new applications. Moreover, it is robust to routing asymmetry, an important requirement in large ISPs, and has a very high (>99.5%) detection rate. Finally, our system is easy to deploy and setup and performs classification in real-time.

[1]  Sebastian Zander,et al.  Self-Learning IP Traffic Classification Based on Statistical Flow Characteristics , 2005, PAM.

[2]  Wanlei Zhou,et al.  Generating regular expression signatures for network traffic classification in trusted network management , 2012, J. Netw. Comput. Appl..

[3]  Li Guo,et al.  A semantics aware approach to automated reverse engineering unknown protocols , 2012, 2012 20th IEEE International Conference on Network Protocols (ICNP).

[4]  Marco Mellia,et al.  DNS to the rescue: discerning content and services in a tangled web , 2012, IMC '12.

[5]  Konstantina Papagiannaki,et al.  Toward the Accurate Identification of Network Applications , 2005, PAM.

[6]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[7]  Stefan Savage,et al.  Unexpected means of protocol inference , 2006, IMC '06.

[8]  Michalis Faloutsos,et al.  SubFlow: Towards practical flow-level traffic classification , 2012, 2012 Proceedings IEEE INFOCOM.

[9]  James Won-Ki Hong,et al.  Traffic Classification Based on Flow Similarity , 2009, IPOM.

[10]  Antonio Nucci,et al.  CUTE: Traffic Classification Using TErms , 2012, 2012 21st International Conference on Computer Communications and Networks (ICCCN).

[11]  Patrick Haffner,et al.  ACAS: automated construction of application signatures , 2005, MineNet '05.

[12]  Ming-Yang Kao,et al.  Hamsa: fast signature generation for zero-day polymorphic worms with provable attack resilience , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[13]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[14]  James Won-Ki Hong,et al.  Towards automated application signature generation for traffic identification , 2008, NOMS 2008 - 2008 IEEE Network Operations and Management Symposium.

[15]  B. Karp,et al.  Autograph: Toward Automated, Distributed Worm Signature Detection , 2004, USENIX Security Symposium.

[16]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[17]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[18]  James Newsome,et al.  Polygraph: automatically generating signatures for polymorphic worms , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[19]  Renata Teixeira,et al.  Early application identification , 2006, CoNEXT '06.

[20]  Zhenkai Liang,et al.  Polyglot: automatic extraction of protocol message format using dynamic binary analysis , 2007, CCS '07.