Clear and Present Data: Opaque Traffic and its Security Implications for the Future

Opaque traffic, i.e., traffic that is compressed or encrypted, incurs particularly high overhead for deep packet inspection engines and often yields little or no useful information. Our experiments indicate that an astonishing 89% of payload-carrying TCP packets — and 86% of bytes transmitted — are opaque, forcing us to consider the challenges this class of traffic presents for network security, both in the short-term and, as the proportion of opaque traffic continues to rise, for the future. We provide a first step toward addressing some of these challenges by introducing new techniques for accurate real-time winnowing, or filtering, of such traffic based on the intuition that the distribution of byte values found in opaque traffic will differ greatly from that found in transparent traffic. Evaluation on traffic from two campuses reveals that our techniques are able to identify opaque data with 95% accuracy, on average, while examining less than 16 bytes of payload data. We implemented our most promising technique as a preprocessor for the Snort IDS and compared the performance to a stock Snort instance by running both instances live, on identical traffic streams, using a Data Acquisition and Generation (DAG) card deployed within a campus network. Winnowing enabled Snort to handle a peak load of 1.2Gbps, with zero percent packet loss, and process almost one hundred billion packets over 24 hours — a 147% increase over the number processed by the stock Snort instance. This increase in capacity resulted in 33,000 additional alerts which would otherwise have been missed.

[1]  Christian Callegari,et al.  Identifying Skype Traffic in a Large-Scale Flow Data Repository , 2011, TMA.

[2]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[3]  Muhammad Zubair Shafiq,et al.  Malware detection using statistical analysis of byte-level file content , 2009, CSI-KDD '09.

[4]  W. Conover A Kolmogorov Goodness-of-Fit Test for Discontinuous Distributions , 1972 .

[5]  Liam Paninski,et al.  Estimating entropy on m bins given fewer than m samples , 2004, IEEE Transactions on Information Theory.

[6]  J. Wolfowitz,et al.  Optimum Character of the Sequential Probability Ratio Test , 1948 .

[7]  Gregory B. White,et al.  An Approach to Detect Executable Content for Anomaly Based Network Intrusion Detection , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[8]  Ueli Maurer,et al.  A universal statistical test for random bit generators , 1990, Journal of Cryptology.

[9]  Hari Balakrishnan,et al.  Fast portscan detection using sequential hypothesis testing , 2004, IEEE Symposium on Security and Privacy, 2004. Proceedings. 2004.

[10]  Peter Dorfinger,et al.  Entropy Estimation for Real-Time Encrypted Traffic Identification (Short Paper) , 2011, TMA.

[11]  Urbashi Mitra,et al.  Parametric Methods for Anomaly Detection in Aggregate Traffic , 2011, IEEE/ACM Transactions on Networking.

[12]  Gregory A. Hall,et al.  Sliding Window Measurement for File Type Identification , 2007 .

[13]  Liam Paninski,et al.  Undersmoothed Kernel Entropy Estimators , 2008, IEEE Transactions on Information Theory.

[14]  Aditya Akella,et al.  Using strongly typed networking to architect for tussle , 2010, Hotnets-IX.

[15]  Vern Paxson,et al.  Shunting: a hardware/software architecture for flexible, high-performance network intrusion prevention , 2007, CCS '07.

[16]  Robin Sommer,et al.  Viable network intrusion detection in high-performance environments , 2005 .

[17]  Ryan M. Harris,et al.  USING ARTIFICIAL NEURAL NETWORKS FOR FORENSIC FILE TYPE IDENTIFICATION , 2007 .

[18]  Salvatore J. Stolfo,et al.  Anomalous Payload-Based Network Intrusion Detection , 2004, RAID.

[19]  R. Khan,et al.  Sequential Tests of Statistical Hypotheses. , 1972 .

[20]  Chadi Barakat,et al.  Enhancing Application Identification by Means of Sequential Testing , 2009, Networking.

[21]  Mark Handley,et al.  The Case for Ubiquitous Transport-Level Encryption , 2010, USENIX Security Symposium.

[22]  Teresa Pepe,et al.  Entropy-based traffic filtering to support real-time Skype detection , 2010, IWCMC.

[23]  Anja Feldmann,et al.  Enriching network security analysis with time travel , 2008, SIGCOMM '08.

[24]  Niccolo Cascarano,et al.  An Experimental Evaluation of the Computational Cost of a DPI Traffic Classifier , 2009, GLOBECOM 2009 - 2009 IEEE Global Telecommunications Conference.

[25]  Jean Goubault-Larrecq Detecting Subverted Cryptographic Protocols by Entropy Checking , 2006 .

[26]  Ken Chiang,et al.  A Case Study of the Rustock Rootkit and Spam Bot , 2007, HotBots.

[27]  Anja Feldmann,et al.  Dynamic Application-Layer Protocol Analysis for Network Intrusion Detection , 2006, USENIX Security Symposium.

[28]  Chris P. Tsokos,et al.  Mathematical Statistics with Applications , 2009 .

[29]  P. Kiberstis Playing Hide and Seek , 2014, Science Signaling.

[30]  Evangelos P. Markatos,et al.  Improving the accuracy of network intrusion detection systems under load using selective packet discarding , 2010, EUROSEC '10.

[31]  Michael B. Marcus,et al.  Truncated sequential hypothesis tests , 1967, IEEE Trans. Inf. Theory.

[32]  Robert Lyda,et al.  Using Entropy Analysis to Find Encrypted and Packed Malware , 2007, IEEE Security & Privacy.

[33]  Adi Shamir,et al.  Playing "Hide and Seek" with Stored Keys , 1999, Financial Cryptography.

[34]  Simson L. Garfinkel,et al.  File Fragment Classification-The Case for Specialized Approaches , 2009, 2009 Fourth International IEEE Workshop on Systematic Approaches to Digital Forensic Engineering.

[35]  Roy Fielding RFC 2068 : Hypertext Transfer Protocol-HTTP/1.1 , 1997 .

[36]  Paras Malhotra Detection of encrypted streams for egress monitoring , 2007 .

[37]  Junyong Luo,et al.  Forensic Analysis of Document Fragment Based on SVM , 2006, 2006 International Conference on Intelligent Information Hiding and Multimedia.

[38]  David M. Nicol,et al.  The Koobface botnet and the rise of social malware , 2010, 2010 5th International Conference on Malicious and Unwanted Software.

[39]  Anja Feldmann,et al.  Operational experiences with high-volume network intrusion detection , 2004, CCS '04.

[40]  Michael Schatz,et al.  A toolkit for detecting and analyzing malicious software , 2002, 18th Annual Computer Security Applications Conference, 2002. Proceedings..

[41]  Sergey Bratus,et al.  Automated mapping of large binary objects using primitive fragment type classification , 2010, Digit. Investig..

[42]  Cor J. Veenman Statistical Disk Cluster Classification for File Carving , 2007, Third International Symposium on Information Assurance and Security.

[43]  Dario Rossi,et al.  Revealing skype traffic: when randomness plays with you , 2007, SIGCOMM '07.

[44]  Mark John Taylor,et al.  FORSIGS: Forensic Signature Analysis of the Hard Drive for Multimedia File Fingerprints , 2007, SEC.