Zerber+R: top-k retrieval from a confidential index

Privacy-preserving document exchange among collaboration groups in an enterprise as well as across enterprises requires techniques for sharing and search of access-controlled information through largely untrusted servers. In these settings search systems need to provide confidentiality guarantees for shared information while offering IR properties comparable to the ordinary search engines. Top-k is a standard IR technique which enables fast query execution on very large indexes and makes systems highly scalable. However, indexing access-controlled information for top-k retrieval is a challenging task due to the sensitivity of the term statistics used for ranking. In this paper we present Zerber+R -- a ranking model which allows for privacy-preserving top-k retrieval from an outsourced inverted index. We propose a relevance score transformation function which makes relevance scores of different terms indistinguishable, such that even if stored on an untrusted server they do not reveal information about the indexed data. Experiments on two real-world data sets show that Zerber+R makes economical usage of bandwidth and offers retrieval properties comparable with an ordinary inverted index.

[1]  Dan Suciu,et al.  Controlling Access to Published Data Using Cryptography , 2003, VLDB.

[2]  Marianne Winslett,et al.  Trustworthy keyword search for regulatory-compliant records retention , 2006, VLDB.

[3]  Qian Wang,et al.  Plutus: Scalable Secure File Sharing on Untrusted Storage , 2003, FAST.

[4]  de,et al.  Prototype Demonstration: Using Link Analysis to Identify Aspects in Faceted Web Search , 2006 .

[5]  Jaideep Vaidya,et al.  Privacy-preserving indexing of documents on the network , 2003, The VLDB Journal.

[6]  Elisa Bertino,et al.  Securing XML Documents with Author-X , 2001, IEEE Internet Comput..

[7]  Michael Mitzenmacher,et al.  Privacy Preserving Keyword Searches on Remote Encrypted Data , 2005, ACNS.

[8]  H. Sorenson,et al.  Nonlinear Bayesian estimation using Gaussian sum approximations , 1972 .

[9]  Hakan Hacigümüs,et al.  Executing SQL over encrypted data in the database-service-provider model , 2002, SIGMOD '02.

[10]  Víctor Pàmies,et al.  Open Directory Project , 2003 .

[11]  Rafail Ostrovsky,et al.  Public Key Encryption with Keyword Search , 2004, EUROCRYPT.

[12]  Guan-Ming Su,et al.  Confidentiality-preserving rank-ordered search , 2007, StorageSS '07.

[13]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[14]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[15]  Hovav Shacham,et al.  SiRiUS: Securing Remote Untrusted Storage , 2003, NDSS.

[16]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[17]  Marianne Winslett,et al.  Zerber: r-confidential indexing for distributed documents , 2008, EDBT '08.

[18]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[19]  Dawn Xiaodong Song,et al.  Practical techniques for searches on encrypted data , 2000, Proceeding 2000 IEEE Symposium on Security and Privacy. S&P 2000.

[20]  Charles L. A. Clarke,et al.  A security model for full-text file system search in multi-user environments , 2005, FAST'05.