High-Speed Flow Nature Identification

This paper concerns the fundamental problem of identifying the content nature of a flow, namely text, binary, or encrypted, for the first time. We propose Iustitia, a tool for identifying flow nature on the fly. The key observation behind Iustitia is that text flows have the lowest entropy and encrypted flows have the highest entropy, while the entropy of binary flows stands in between. The basic idea of Iustitia is to classify flows using machine learning techniques where a feature is the entropy of every certain number of consecutive bytes. The key features of Iustitia are high speed (10% of average packet inter-arrival time) and high accuracy (86%).

[1]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[2]  Yin Zhang,et al.  Detecting Backdoors , 2000, USENIX Security Symposium.

[3]  Li Guo,et al.  Using Entropy to Classify Traffic More Deeply , 2011, 2011 IEEE Sixth International Conference on Networking, Architecture, and Storage.

[4]  Ke Wang,et al.  Fileprints: identifying file types by n-gram analysis , 2005, Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop.

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  Donald F. Towsley,et al.  Locating network monitors: complexity, heuristics, and coverage , 2005, INFOCOM.

[7]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[8]  Abhay K. Bhushan,et al.  The File Transfer Protocol , 1971, Request for Comments.

[9]  Charles V. Wright,et al.  On Inferring Application Protocol Behaviors in Encrypted Network Traffic , 2006, J. Mach. Learn. Res..

[10]  Mohammad Hossain Heydari,et al.  Content based file type detection algorithms , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[11]  Vyas Sekar,et al.  Data streaming algorithms for estimating entropy of network traffic , 2006, SIGMETRICS '06/Performance '06.

[12]  Mark Crovella,et al.  Mining anomalies using traffic feature distributions , 2005, SIGCOMM '05.

[13]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[14]  Antonio Pescapè,et al.  TIE: A Community-Oriented Traffic Classification Platform , 2009, TMA.

[15]  Riyad Alshammari,et al.  Can encrypted traffic be identified without port numbers, IP addresses and payload inspection? , 2011, Comput. Networks.

[16]  Khalid Sayood,et al.  Introduction to Data Compression , 1996 .

[17]  Pavel Paclík,et al.  Adaptive floating search methods in feature selection , 1999, Pattern Recognit. Lett..

[18]  Catherine Rosenberg,et al.  Behavioral authentication of server flows , 2003, 19th Annual Computer Security Applications Conference, 2003. Proceedings..

[19]  Christophe Diot,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM.

[20]  Dario Rossi,et al.  KISS: Stochastic Packet Inspection Classifier for UDP Traffic , 2010, IEEE/ACM Transactions on Networking.

[21]  Peter Dorfinger,et al.  Entropy Estimation for Real-Time Encrypted Traffic Identification (Short Paper) , 2011, TMA.

[22]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[23]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[24]  Krishna Bharat,et al.  SPHINX: A Framework for Creating Personal, Site-Specific Web Crawlers , 1998, Comput. Networks.

[25]  Anja Feldmann,et al.  Dynamic Application-Layer Protocol Analysis for Network Intrusion Detection , 2006, USENIX Security Symposium.

[26]  Yanghee Choi,et al.  NeTraMark: a network traffic classification benchmark , 2011, CCRV.

[27]  Dong Xiang,et al.  Information-theoretic measures for anomaly detection , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[28]  Robert Lyda,et al.  Using Entropy Analysis to Find Encrypted and Packed Malware , 2007, IEEE Security & Privacy.

[29]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.