Statistical Identi cation of Encrypted Web Browsing TraÆc

Encryption is often proposed as a tool for protecting the privacy of World Wide Web browsing. However, encryption{particularly as typically implemented in, or in concert with popular Web browsers{does not hide all information about the encrypted plaintext. Speci cally, HTTP object count and sizes are often revealed (or at least incompletely concealed). We investigate the identi ability of World Wide Web traÆc based on this unconcealed information in a large sample of Web pages, and show that it suÆces to identify a signi cant fraction of them quite reliably. We also suggest some possible countermeasures against the exposure of this kind of information and experimentally evaluate their e ec-

[1]  Christopher Allen,et al.  The TLS Protocol Version 1.0 , 1999, RFC.

[2]  Daniel R. Simon,et al.  Cryptographic defense against traffic analysis , 1993, STOC.

[3]  Gene Tsudik,et al.  Towards an Analysis of Onion Routing Security , 2000, Workshop on Design Issues in Anonymity and Unobservability.

[4]  Alan O. Freier,et al.  The SSL Protocol Version 3.0 , 1996 .

[5]  Yossi Matias,et al.  Consistent, yet anonymous, Web access with LPWA , 1999, CACM.

[6]  Michael K. Reiter,et al.  Crowds: anonymity for Web transactions , 1998, TSEC.

[7]  Paul Syverson,et al.  Onion Routing for Anonymous and Private Internet Connections , 1999 .

[8]  David Chaum,et al.  The dining cryptographers problem: Unconditional sender and recipient untraceability , 1988, Journal of Cryptology.

[9]  Keith Moore,et al.  Use of HTTP State Management , 2000, RFC.

[10]  Edward W. Felten,et al.  Timing attacks on Web privacy , 2000, CCS.

[11]  Piotr Indyk,et al.  Scalable Techniques for Clustering the Web , 2000, WebDB.

[12]  C. Molina-Jimenez,et al.  True anonymity without mixes , 2001, Proceedings. The Second IEEE Workshop on Internet Applications. WIAPP 2001.

[13]  Brian Neil Levine,et al.  A protocol for anonymous communication over the Internet , 2000, CCS.

[14]  Hannes Federrath,et al.  Web MIXes: A System for Anonymous and Unobservable Internet Access , 2000, Workshop on Design Issues in Anonymity and Unobservability.

[15]  Tim Berners-Lee,et al.  Hypertext transfer protocol--http/i , 1993 .

[16]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[17]  David Chaum,et al.  Untraceable electronic mail, return addresses, and digital pseudonyms , 1981, CACM.

[18]  Andrew H. Mutz,et al.  Transparent Content Negotiation in HTTP , 1998, RFC.

[19]  Lakshminarayanan Subramanian,et al.  An investigation of geographic mapping techniques for internet hosts , 2001, SIGCOMM 2001.