PrivatePond: Outsourced Management of Web Corpuses

With the rise of cloud computing, it is increasingly attractive for end-users (organizations and individuals) to outsource the management of their data to a small number of largescale service providers. In this paper, we consider a user who wants to outsource storage and search for a corpus of web documents (e.g., an intranet). At the same time, the corpus may contain condential documents that the organization does not want to reveal to the service provider. While past work has considered the problems of secure keyword search and secure indexing, all of the proposed tools require signicant modications to existing search engines and infrastructure. In this paper, we propose a system called PrivatePond, which allows condential outsourced web search using an unmodied search engine. The system is built around the central idea of a secure indexable representation, which is attached to each document in the corpus, and constructed with the goal of balancing condentiality and searchability. In addition, a secure local proxy is used to provide transparency to the end-user. While the idea of a secure indexable representation is very general, we propose a preliminary instantiation of this idea, which provides practical condentiality. In addition, an experimental evaluation indicates that this indexable representation can provide high-quality search and ranking, similar to what is available using the unmodied corpus.

[1]  Jon M. Kleinberg,et al.  Wherefore art thou R3579X? , 2011, Commun. ACM.

[2]  Brent Waters,et al.  Attribute-based encryption for fine-grained access control of encrypted data , 2006, CCS '06.

[3]  Marianne Winslett,et al.  Zerber: r-confidential indexing for distributed documents , 2008, EDBT '08.

[4]  Feifei Li,et al.  Dynamic authenticated index structures for outsourced databases , 2006, SIGMOD Conference.

[5]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[6]  Ben He,et al.  Document Length Normalization , 2009, Encyclopedia of Database Systems.

[7]  Dawn Xiaodong Song,et al.  Practical techniques for searches on encrypted data , 2000, Proceeding 2000 IEEE Symposium on Security and Privacy. S&P 2000.

[8]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[9]  Wolfgang Nejdl,et al.  Zerber+R: top-k retrieval from a confidential index , 2009, EDBT '09.

[10]  Rafail Ostrovsky,et al.  Public Key Encryption with Keyword Search , 2004, EUROCRYPT.

[11]  Hakan Hacigümüs,et al.  Executing SQL over encrypted data in the database-service-provider model , 2002, SIGMOD '02.

[12]  Dan Suciu,et al.  Controlling Access to Published Data Using Cryptography , 2003, VLDB.

[13]  Eu-Jin Goh,et al.  Secure Indexes , 2003, IACR Cryptol. ePrint Arch..

[14]  Ravi Kumar,et al.  On anonymizing query logs via token-based hashing , 2007, WWW '07.

[15]  Jaideep Vaidya,et al.  Privacy-preserving indexing of documents on the network , 2003, The VLDB Journal.

[16]  Donald F. Towsley,et al.  Resisting structural re-identification in anonymized social networks , 2010, The VLDB Journal.

[17]  Xiaofeng Meng,et al.  Integrity Auditing of Outsourced Data , 2007, VLDB.

[18]  Guan-Ming Su,et al.  Confidentiality-preserving rank-ordered search , 2007, StorageSS '07.

[19]  Michael Gertz,et al.  Authentic Third-party Data Publication , 2000, DBSec.

[20]  Ramakrishnan Srikant,et al.  Order preserving encryption for numeric data , 2004, SIGMOD '04.

[21]  Alistair Moffat,et al.  Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.

[22]  Michael Mitzenmacher,et al.  Privacy Preserving Keyword Searches on Remote Encrypted Data , 2005, ACNS.

[23]  Kian-Lee Tan,et al.  Verifying completeness of relational query results in data publishing , 2005, SIGMOD '05.

[24]  Kian-Lee Tan,et al.  Authenticating query results in edge computing , 2004, Proceedings. 20th International Conference on Data Engineering.