Website Fingerprinting and Identification Using Ordered Feature Sequences

We consider website fingerprinting over encrypted and prox-ied channel. It has been shown that information on packet sizes is sufficient to achieve good identification accuracy. Recently, traffic morphing [1] was proposed to thwart website fingerprinting by changing the packet size distribution so as to mimic some other website, while minimizing bandwidth overhead. In this paper, we point out that packet ordering information, though noisy, can be utilized to enhance website fingerprinting. In addition, traces of the ordering information remain even under traffic morphing and they can be extracted for identification. When web access is performed over OpenSSH and 2000 profiled websites, the identification accuracy of our scheme reaches 81%, which is 11% better than Liberatore and Levine's scheme presented in CCS'06 [2]. We are able to identify 78% of the morphed traffic among 2000 websites while Liberatore and Levine's scheme identifies only 52%. Our analysis suggests that an effective countermeasure to website fingerprinting should not only hide the packet size distribution, but also aggressively remove the ordering information.

[1]  Edward W. Felten,et al.  Timing attacks on Web privacy , 2000, CCS.

[2]  Dawn Xiaodong Song,et al.  Timing Analysis of Keystrokes and Timing Attacks on SSH , 2001, USENIX Security Symposium.

[3]  Andrew Hintz,et al.  Fingerprinting Websites Using Traffic Analysis , 2002, Privacy Enhancing Technologies.

[4]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[5]  Marco Mellia,et al.  Revealing skype traffic: when randomness plays with you , 2007, SIGCOMM 2007.

[6]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[7]  W. Heeringa,et al.  Perceptive evaluation of Levenshtein dialect distance measurements using Norwegian dialect data , 2004, Language Variation and Change.

[8]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[9]  Keith J. Jones,et al.  10th USENIX Security Symposium , 2001, login Usenix Mag..

[10]  Charles V. Wright,et al.  Traffic Morphing: An Efficient Defense Against Statistical Traffic Analysis , 2009, NDSS.

[11]  David D. Jensen,et al.  Privacy Vulnerabilities in Encrypted HTTP Streams , 2005, Privacy Enhancing Technologies.

[12]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[13]  Brian Neil Levine,et al.  Inferring the source of encrypted HTTP connections , 2006, CCS '06.

[14]  Charles V. Wright,et al.  On Inferring Application Protocol Behaviors in Encrypted Network Traffic , 2006, J. Mach. Learn. Res..

[15]  Lili Qiu,et al.  Statistical identification of encrypted Web browsing traffic , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[16]  Tadayoshi Kohno,et al.  Devices That Tell on You: Privacy Trends in Consumer Ubiquitous Computing , 2007, USENIX Security Symposium.

[17]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .