Efficient tree pattern queries on encrypted XML documents

Outsourcing XML documents is a challenging task, because it encrypts the documents, while still requiring efficient query processing. Past approaches on this topic either leak structural information or fail to support searching that has constraints on XML node content. In addition, they adopt a filtering-and-refining framework, which requires the users to prune false positives from the query results. To address these problems, we present a solution for efficient evaluation of tree pattern queries (TPQs) on encrypted XML documents. We create a domain hierarchy, such that each XML document can be embedded in it. By assigning each node in the hierarchy a position, we create for each document a vector, which encodes both the structural and textual information about the document. Similarly, a vector is created also for a TPQ. Then, the matching between a TPQ and a document is reduced to calculating the distance between their vectors. For the sake of privacy, such vectors are encrypted before being outsourced. To improve the matching efficiency, we use a k-d tree to partition the vectors into non-overlapping subsets, such that non-matchable documents are pruned as early as possible. The extensive evaluation shows that our solution is efficient and scalable to large dataset.

[1]  Panos Kalnis,et al.  SABRE: a Sensitive Attribute Bucketization and REdistribution framework for t-closeness , 2011, The VLDB Journal.

[2]  Adam D. Smith,et al.  The price of privately releasing contingency tables and the spectra of random matrices with correlated rows , 2010, STOC '10.

[3]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[4]  Chris Clifton,et al.  Differential identifiability , 2012, KDD.

[5]  Divesh Srivastava,et al.  Differentially Private Spatial Decompositions , 2011, 2012 IEEE 28th International Conference on Data Engineering.

[6]  Pieter H. Hartel,et al.  Efficient Tree Search in Encrypted Data , 2004, Inf. Secur. J. A Glob. Perspect..

[7]  Divyakant Agrawal,et al.  A Comprehensive Framework for Secure Query Processing on Relational Data in the Cloud , 2011, Secure Data Management.

[8]  Tok Wang Ling,et al.  Labeling Dynamic XML Documents: An Order-Centric Approach , 2012, IEEE Transactions on Knowledge and Data Engineering.

[9]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[10]  Khaled El Emam,et al.  Practicing Differential Privacy in Health Care: A Review , 2013, Trans. Data Priv..

[11]  Cong Wang,et al.  Privacy-Preserving Query over Encrypted Graph-Structured Data in Cloud Computing , 2011, 2011 31st International Conference on Distributed Computing Systems.

[12]  Nikos Mamoulis,et al.  Secure kNN computation on encrypted databases , 2009, SIGMOD Conference.

[13]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[14]  Elisa Bertino,et al.  Structural signatures for tree data structures , 2008, Proc. VLDB Endow..

[15]  Yin Yang,et al.  An Efficient Approach to Support Querying Secure Outsourced XML Information , 2006, CAiSE.

[16]  Bandula Jayatilaka,et al.  Information systems outsourcing: a survey and analysis of the literature , 2004, DATB.

[17]  Hakan Hacigümüs,et al.  Executing SQL over encrypted data in the database-service-provider model , 2002, SIGMOD '02.

[18]  Vassilis J. Tsotras,et al.  Tree-Pattern Queries on a Lightweight XML Processor , 2005, VLDB.

[19]  Sushil Jajodia,et al.  Balancing confidentiality and efficiency in untrusted relational DBMSs , 2003, CCS '03.

[20]  Pekka Kilpeläinen,et al.  Tree Matching Problems with Applications to Structured Text Databases , 2022 .

[21]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[22]  Laks V. S. Lakshmanan,et al.  Efficient secure query evaluation over encrypted XML databases , 2006, VLDB.

[23]  Gene Tsudik,et al.  A Privacy-Preserving Index for Range Queries , 2004, VLDB.

[24]  Laks V. S. Lakshmanan,et al.  On the evaluation of tree pattern queries , 2006, ICSOFT.

[25]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[26]  Rafail Ostrovsky,et al.  Searchable symmetric encryption: Improved definitions and efficient constructions , 2011, J. Comput. Secur..

[27]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[28]  Chris Clifton,et al.  How Much Is Enough? Choosing ε for Differential Privacy , 2011, ISC.

[29]  Elisa Bertino,et al.  Efficient privacy preserving content based publish subscribe systems , 2012, SACMAT '12.

[30]  Joachim Posegga,et al.  On Structural Signatures for Tree Data Structures , 2012, ACNS.

[31]  Jianneng Cao,et al.  Publishing Microdata with a Robust Privacy Guarantee , 2012, Proc. VLDB Endow..

[32]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[33]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[34]  Dorothy E. Denning,et al.  Cryptography and Data Security , 1982 .

[35]  Taflan I. Gündem,et al.  A survey on querying encrypted XML documents for databases as a service , 2008, SGMD.