Embellishing text search queries to protect user privacy

Users of text search engines are increasingly wary that their activities may disclose confidential information about their business or personal profiles. It would be desirable for a search engine to perform document retrieval for users while protecting their intent. In this paper, we identify the privacy risks arising from semantically related search terms within a query, and from recurring high-specificity query terms in a search session. To counter the risks, we propose a solution for a similarity text retrieval system to offer anonymity and plausible deniability for the query terms, and hence the user intent, without degrading the system's precision-recall performance. The solution comprises a mechanism that embellishes each user query with decoy terms that exhibit similar specificity spread as the genuine terms, but point to plausible alternative topics. We also provide an accompanying retrieval scheme that enables the search engine to compute the encrypted document relevance scores from only the genuine search terms, yet remain oblivious to their distinction from the decoys. Empirical evaluation results are presented to substantiate the effectiveness of our solution.

[1]  Kyriakos Mouratidis,et al.  Authenticating the query results of text search engines , 2008, Proc. VLDB Endow..

[2]  Susan T. Dumais,et al.  Latent Semantic Indexing (LSI) and TREC-2 , 1993, TREC.

[3]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[4]  Chris Ding,et al.  On the Use of Singular Value Decomposition for Text Retrieval , 2000 .

[5]  Dawn Xiaodong Song,et al.  Practical techniques for searches on encrypted data , 2000, Proceeding 2000 IEEE Symposium on Security and Privacy. S&P 2000.

[6]  Rafail Ostrovsky,et al.  Replication is not needed: single database, computationally-private information retrieval , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[7]  Oren Etzioni,et al.  Self-supervised Relation Extraction from the Web , 2006, ISMIS.

[8]  Ramayya Krishnan,et al.  Privacy-preserving similarity-based text retrieval , 2010, TOIT.

[9]  Stephen E. Robertson,et al.  Experimentation as a way of life: Okapi at TREC , 2000, Inf. Process. Manag..

[10]  Christos Faloutsos,et al.  On the 'Dimensionality Curse' and the 'Self-Similarity Blessing' , 2001, IEEE Trans. Knowl. Data Eng..

[11]  Massimo Barbaro,et al.  A Face Is Exposed for AOL Searcher No , 2006 .

[12]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[13]  Nick Mathewson,et al.  Tor: The Second-Generation Onion Router , 2004, USENIX Security Symposium.

[14]  Xie Kanglin Lucene Search Engine , 2007 .

[15]  WalkerS.,et al.  Experimentation as a way of life , 2000 .

[16]  Rafail Ostrovsky,et al.  Public Key Encryption with Keyword Search , 2004, EUROCRYPT.

[17]  Ralph Grishman,et al.  Discovering Relations among Named Entities from Large Corpora , 2004, ACL.

[18]  Brent Waters,et al.  New constructions and practical applications for private stream searching , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[19]  David Chaum,et al.  Untraceable electronic mail, return addresses, and digital pseudonyms , 1981, CACM.

[20]  Eytan Adar,et al.  User 4XXXXX9: Anonymizing Query Logs , 2007 .

[21]  Chris Clifton,et al.  Providing Privacy through Plausibly Deniable Search , 2009, SDM.

[22]  Benny Pinkas,et al.  Keyword Search and Oblivious Pseudorandom Functions , 2005, TCC.

[23]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[24]  Mark Sanderson,et al.  Document frequency and term specificity , 2007, RIAO.

[25]  Josh Benaloh,et al.  Dense Probabilistic Encryption , 1999 .

[26]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.