Web Search Query Privacy, an End-User Perspective

While search engines have become vital tools for searching information on the Internet, privacy issues remain a growing concern due to the technological abilities of search engines to retain user search logs. Although such capabilities might provide enhanced personalized search results, the confidentiality of user intent remains uncertain. Even with web search query obfuscation techniques, another challenge remains, namely, reusing the same obfuscation methods is problematic, given that search engines have enormous computation and storage resources for query disambiguation. A number of web search query privacy procedures involve the cooperation of the search engine, a non-trusted entity in such cases, making query obfuscation even more challenging. In this study, we provide a review on how search engines work in regards to web search queries and user intent. Secondly, this study reviews material in a manner accessible to those outside computer science with the intent to introduce knowledge of web search engines to enable non-computer scientists to approach web search query privacy innovatively. As a contribution, we identify and highlight areas open for further investigative and innovative research in regards to end-user personalized web search privacy—that is methods that can be executed on the user side without third party involvement such as, search engines. The goal is to motivate future web search obfuscation heuristics that give users control over their personal search privacy.

[1]  Manish Gupta,et al.  Information Retrieval with Verbose Queries , 2015, Found. Trends Inf. Retr..

[2]  George Danezis,et al.  A Survey of Anonymous Communication Channels , 2008 .

[3]  Christoph Mangold,et al.  A survey and classification of semantic search approaches , 2007, Int. J. Metadata Semant. Ontologies.

[4]  Lidong Bing,et al.  Web Query Reformulation via Joint Modeling of Latent Topic Dependency and Term Context , 2015, TOIS.

[5]  Jie Wu,et al.  Survey on anonymous communications in computer networks , 2010, Comput. Commun..

[6]  Héctor Allende,et al.  Query Intent Detection Based on Query Log Mining , 2014, J. Web Eng..

[7]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[8]  W. Bruce Croft,et al.  Context-Based Topic Models for Query Modification , 2005 .

[9]  Enhong Chen,et al.  Improving search relevance for short queries in community question answering , 2014, WSDM.

[10]  Balachander Krishnamurthy,et al.  Measuring personalization of web search , 2013, WWW.

[11]  Antonio Ruiz-Martínez,et al.  A survey on solutions and main free tools for privacy enhancing Web communications , 2012, J. Netw. Comput. Appl..

[12]  Roberto Navigli,et al.  Additional Key Words and Phrases: Word sense disambiguation, word sense discrimination, WSD, lexical semantics, lexical ambiguity, sense annotation, semantic annotation , 2009 .

[13]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[14]  Peiquan Jin,et al.  Exploiting temporal information in Web search , 2014, Expert Syst. Appl..

[15]  Yang Song,et al.  Query suggestion by constructing term-transition graphs , 2012, WSDM '12.

[16]  Paul F. Syverson,et al.  Anonymous connections and onion routing , 1997, Proceedings. 1997 IEEE Symposium on Security and Privacy (Cat. No.97CB36097).

[17]  Roksana Boreli,et al.  On the Effectiveness of Obfuscation Techniques in Online Social Networks , 2014, Privacy Enhancing Technologies.

[18]  Nishchol Mishra,et al.  Privacy in Social Networks : A Survey , 2013 .

[19]  Amanda Spink,et al.  Determining the user intent of web search engine queries , 2007, WWW '07.

[20]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[21]  Alissa Cooper,et al.  A survey of query log privacy-enhancing techniques from a policy perspective , 2008, TWEB.

[22]  Mark Sanderson,et al.  Analyzing URL queries , 2010, J. Assoc. Inf. Sci. Technol..

[23]  Michael D. Gordon,et al.  Finding Information on the World Wide Web: The Retrieval Effectiveness of Search Engines , 1999, Inf. Process. Manag..

[24]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[25]  Laura Farinetti,et al.  Ontology Driven Semantic Search , 2004 .

[26]  Mark Levene,et al.  An Introduction to Search Engines and Web Navigation (2. ed.) , 2005 .

[27]  Andrei Z. Broder,et al.  Classifying search queries using the Web as a source of knowledge , 2009, TWEB.

[28]  Irwin King,et al.  Enrichment and Reductionism: Two Approaches for Web Query Classification , 2011, ICONIP.

[29]  Ihab F. Ilyas,et al.  Interpreting keyword queries over web knowledge bases , 2012, CIKM '12.

[30]  John F. Canny,et al.  Large-scale behavioral targeting , 2009, KDD.

[31]  Yang Wang,et al.  Personalization and privacy: a survey of privacy risks and remedies in personalization-based systems , 2012, User Modeling and User-Adapted Interaction.

[32]  Andrei Z. Broder,et al.  Robust classification of rare queries using web knowledge , 2007, SIGIR.

[33]  Lan Chen,et al.  Generating temporal semantic context of concepts using web search engines , 2014, J. Netw. Comput. Appl..

[34]  Vagelis Hristidis,et al.  Information Discovery on Electronic Health Records , 2009 .

[35]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[36]  Ashwin Machanavajjhala,et al.  Publishing Search Logs—A Comparative Study of Privacy Guarantees , 2012, IEEE Transactions on Knowledge and Data Engineering.

[37]  W. Bruce Croft,et al.  Query reformulation using anchor text , 2010, WSDM '10.

[38]  Massimo Barbaro,et al.  A Face Is Exposed for AOL Searcher No , 2006 .

[39]  Bernard J. Jansen,et al.  The effectiveness of Web search engines for retrieving relevant ecommerce links , 2006, Inf. Process. Manag..

[40]  Özgür Ulusoy,et al.  A five-level static cache architecture for web search engines , 2012, Inf. Process. Manag..

[41]  Ramanathan V. Guha,et al.  Semantic search , 2003, WWW '03.

[42]  Xiaokui Xiao,et al.  Obfuscating the Topical Intention in Enterprise Text Search , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[43]  Hema Raghavan,et al.  Improving ad relevance in sponsored search , 2010, WSDM '10.

[44]  Yong Yu,et al.  Identifying ambiguous queries in web search , 2007, WWW '07.

[45]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[46]  Evgeniy Gabrilovich,et al.  Concept-Based Information Retrieval Using Explicit Semantic Analysis , 2011, TOIS.

[47]  Clement T. Yu,et al.  Personalized Web search for improving retrieval effectiveness , 2004, IEEE Transactions on Knowledge and Data Engineering.

[48]  Yiqun Liu,et al.  Query Ambiguity Identification Based on User Behavior Information , 2014, AIRS.

[49]  Efthimis N. Efthimiadis,et al.  Analyzing and evaluating query reformulation strategies in web search logs , 2009, CIKM.

[50]  Ricardo Campos,et al.  Using Web Snippets and Web Query-logs to Measure Implicit Temporal Intents in Queries , 2011, SIGIR 2011.

[51]  Masatoshi Yoshikawa,et al.  Adaptive web search based on user profile constructed without any effort from users , 2004, WWW '04.

[52]  Ophir Frieder,et al.  Automatic web query classification using labeled and unlabeled training data , 2005, SIGIR '05.

[53]  Enhong Chen,et al.  Context-aware query classification , 2009, SIGIR.

[54]  Wolfgang Nejdl,et al.  Evaluating Evidences for Keyword Query Disambiguation in Entity Centric Database Search , 2010, DEXA.

[55]  Yiqun Liu,et al.  Overview of the NTCIR-11 IMine Task , 2014, NTCIR.

[56]  Masaki Aono,et al.  Query subtopic mining for search result diversification , 2014, 2014 International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA).

[57]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[58]  Susan Gauch,et al.  Personalizing Search Based on User Search Histories , 2004 .

[59]  Bernard J. Jansen,et al.  Classifying web search queries to identify high revenue generating customers , 2012, J. Assoc. Inf. Sci. Technol..

[60]  Pasi Fränti,et al.  Web Data Mining , 2009, Encyclopedia of Database Systems.

[61]  Yang Wang,et al.  Private Browsing: an Inquiry on Usability and Privacy Protection , 2014, WPES.