On the Privacy of Web Search Based on Query Obfuscation: A Case Study of TrackMeNot

Web Search is one of the most rapidly growing applications on the internet today. However, the current practice followed by most search engines - of logging and analyzing users' queries - raises serious privacy concerns. One viable solution to search privacy is query obfuscation, whereby a client-side software attempts to mask real user queries via injection of certain noisy queries. In contrast to other privacy-preserving search mechanisms, query obfuscation does not require server-side modifications or a third party infrastructure, thus allowing for ready deployment at the discretion of privacy-conscious users. In this paper, our higher level goal is to analyze whether query obfuscation can preserve users' privacy in practice against an adversarial search engine. We focus on TrackMeNot (TMN) [10,20], a popular search privacy tool based on the principle of query obfuscation. We demonstrate that a search engine, equipped with only a short-term history of a user's search queries, can break the privacy guarantees of TMN by only utilizing off-the-shelf machine learning classifiers.

[1]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[2]  Hao Chen,et al.  Noise Injection for Search Privacy Protection , 2009, 2009 International Conference on Computational Science and Engineering.

[3]  Ian Witten,et al.  Data Mining , 2000 .

[4]  Massimo Barbaro,et al.  A Face Is Exposed for AOL Searcher No , 2006 .

[5]  Ravi Kumar,et al.  "I know what you did last summer": query logs and user privacy , 2007, CIKM '07.

[6]  Jae C. Hong,et al.  Google Resists U.S. Subpoena of Search Data , 2006 .

[7]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[8]  Joan Feigenbaum,et al.  Private web search , 2007, WPES '07.

[9]  Ravi Kumar,et al.  Vanity fair: privacy in querylog bundles , 2008, CIKM '08.

[10]  Rafail Ostrovsky,et al.  Replication is not needed: single database, computationally-private information retrieval , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[11]  Philippe Golle,et al.  Faking contextual data for fun, profit, and privacy , 2009, WPES '09.