Query and data security in the data outsourcing model

An increasing number of applications adopts the model of outsourcing data to a third party for query and storage. Under this model, database owners delegate their database management needs to third party data servers. These servers possess both the necessary resources, in terms of computation, communication and storage, and the required expertise to provide efficient and effective data management and query functionalities to data and information consumers. However, as servers might be untrusted or can be compromised, query and data security issues must be addressed in order for this model to become more practical. An important challenge in this realm is to enable the client to authenticate the query results returned from a server. To authenticate is to verify that the query results are correctly computed from the same data as the data owner has published and all results have been honestly returned. Existing solutions for this problem concentrate mostly on static, relational data scenarios and are based on idealistic properties for certain cryptographic primitives, looking at the problem mostly from a theoretical perspective. This thesis proposes practical and efficient solutions that address the above challenge for both relational and streaming data. Specifically, this dissertation provides dynamic authenticated index structures for authenticating range and aggregation queries in both one and multiple dimensional spaces. The authentication of sliding window queries over data streams is then discussed to support data streaming applications. We also study the problem of query execution assurance over data streams where the data owner and the client are the same entity. A probabilistic verification algorithm is presented that has minimal storage and update costs and achieves a failure probability of at most δ, for any small δ > 0. The algorithm is generalized to handle the scenarios of load shedding and multiple queries. An extensive experimental evaluation for all the proposed methods over both synthetic and real data sets is presented. The findings of this evaluation demonstrate both the correctness and effectiveness of the proposed methods.

[1]  Feifei Li,et al.  Characterizing and Exploiting Reference Locality in Data Stream Applications , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[2]  David Mazières,et al.  Fast and secure distributed read-only file system , 2000, TOCS.

[3]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[4]  Michael Gertz,et al.  A General Model for Authenticated Data Structures , 2004, Algorithmica.

[5]  Joseph M. Hellerstein,et al.  Proof Sketches: Verifiable In-Network Aggregation , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[6]  Roberto Tamassia,et al.  Computational Bounds on Hierarchical Data Processing with Applications to Information Security , 2005, ICALP.

[7]  S. Sudarshan,et al.  Extending query rewriting techniques for fine-grained access control , 2004, SIGMOD '04.

[8]  Roberto Tamassia,et al.  Efficient Content Authentication over Distributed Hash Tables , 2006 .

[9]  Gene Tsudik,et al.  Authentication and integrity in outsourced databases , 2006, TOS.

[10]  Radu Sion,et al.  Query Execution Assurance for Outsourced Databases , 2005, VLDB.

[11]  Feifei Li,et al.  Randomized Synopses for Query Assurance on Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[12]  Dawn Xiaodong Song,et al.  Secure hierarchical in-network aggregation in sensor networks , 2006, CCS '06.

[13]  Rajeev Motwani,et al.  Randomized algorithms , 1996, CSUR.

[14]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[15]  Luc Bouganim,et al.  Safe data sharing and data dissemination on smart devices , 2005, SIGMOD '05.

[16]  Mihir Bellare,et al.  Incremental Cryptography: The Case of Hashing and Signing , 1994, CRYPTO.

[17]  Theodore Johnson,et al.  The Gigascope Stream Database , 2003, IEEE Data Eng. Bull..

[18]  Yufei Tao,et al.  Range aggregate processing in spatial databases , 2004, IEEE Transactions on Knowledge and Data Engineering.

[19]  Stanley B. Zdonik,et al.  Window-aware load shedding for aggregation queries over data streams , 2006, VLDB.

[20]  Christos Faloutsos,et al.  On packing R-trees , 1993, CIKM '93.

[21]  S. Muthukrishnan,et al.  How to Summarize the Universe: Dynamic Maintenance of Quantiles , 2002, VLDB.

[22]  Scott Shenker,et al.  Querying the Internet with PIER , 2003, VLDB.

[23]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[24]  Mihir Bellare,et al.  XOR MACs: New Methods for Message Authentication Using Finite Pseudorandom Functions , 1995, CRYPTO.

[25]  Ramakrishnan Srikant,et al.  Order preserving encryption for numeric data , 2004, SIGMOD '04.

[26]  Feifei Li,et al.  Dynamic authenticated index structures for outsourced databases , 2006, SIGMOD Conference.

[27]  Feifei Li,et al.  Proof-Infused Streams: Enabling Authentication of Sliding Window Queries On Streams , 2007, VLDB.

[28]  Ramakrishnan Srikant,et al.  Privacy preserving OLAP , 2005, SIGMOD '05.

[29]  Sharad Mehrotra,et al.  Progressive approximate aggregate queries with a multi-resolution tree structure , 2001, SIGMOD '01.

[30]  Stanley B. Zdonik,et al.  Fast, Secure Encryption for Indexing in a Column-Oriented DBMS , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[31]  Ralph C. Merkle,et al.  Secure communications over insecure channels , 1978, CACM.

[32]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[33]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[34]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[35]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, TODS.

[36]  Radu Sion,et al.  Rights Protection for Relational Data , 2004, IEEE Trans. Knowl. Data Eng..

[37]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[38]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[39]  Moni Naor,et al.  Certificate revocation and certificate update , 1998, IEEE Journal on Selected Areas in Communications.

[40]  Gene Tsudik,et al.  A Privacy-Preserving Index for Range Queries , 2004, VLDB.

[41]  Gurmeet Singh Manku,et al.  Approximate counts and quantiles over sliding windows , 2004, PODS.

[42]  Windsor W. Hsu,et al.  Fossilized index: the linchpin of trustworthy non-alterable electronic records , 2005, SIGMOD '05.

[43]  Michael T. Goodrich,et al.  Authenticated Data Structures for Graph and Geometric Searching , 2003, CT-RSA.

[44]  Feifei Li,et al.  GreedyDual-Join: Locality-Aware Buffer Management for Approximate Join Processing Over Data Streams , 2004 .

[45]  Michael T. Goodrich,et al.  Persistent Authenticated Dictionaries and Their Applications , 2001, ISC.

[46]  Dan Suciu,et al.  Controlling Access to Published Data Using Cryptography , 2003, VLDB.

[47]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[48]  Radu Sion,et al.  Rights protection for discrete numeric streams , 2006, IEEE Transactions on Knowledge and Data Engineering.

[49]  Hakan Hacigümüs,et al.  Executing SQL over encrypted data in the database-service-provider model , 2002, SIGMOD '02.

[50]  Timos K. Sellis,et al.  A model for the prediction of R-tree performance , 1996, PODS.

[51]  Alan Siegel,et al.  On universal classes of fast high performance hash functions, their time-space tradeoff, and their applications , 1989, 30th Annual Symposium on Foundations of Computer Science.

[52]  Krishna P. Gummadi,et al.  An analysis of Internet content delivery systems , 2002, OPSR.

[53]  Florin Rusu,et al.  Pseudo-random number generation for sketch-based estimations , 2007, TODS.

[54]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[55]  Elisa Bertino,et al.  Selective and authentic third-party distribution of XML documents , 2004, IEEE Transactions on Knowledge and Data Engineering.

[56]  Rajeev Motwani,et al.  Load shedding for aggregation queries over data streams , 2004, Proceedings. 20th International Conference on Data Engineering.

[57]  Radu Sion,et al.  Proving ownership over categorical data , 2004, Proceedings. 20th International Conference on Data Engineering.

[58]  Kian-Lee Tan,et al.  Authenticating query results in edge computing , 2004, Proceedings. 20th International Conference on Data Engineering.

[59]  Lukasz Golab,et al.  Sliding Window Query Processing over Data Streams , 2006 .

[60]  Michael Gertz,et al.  Authentic Third-party Data Publication , 2000, DBSec.

[61]  Luc Bouganim,et al.  Chip-Secured Data Access: Reconciling Access Rights with Data Encryption , 2003, VLDB.

[62]  Beng Chin Ooi,et al.  Multiple aggregations over data streams , 2005, SIGMOD '05.

[63]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[64]  Feifei Li,et al.  Authenticated Index Structures for AggregationQueries in Outsourced Databases , 2006 .

[65]  Ralph C. Merkle,et al.  A Certified Digital Signature , 1989, CRYPTO.

[66]  Ramarathnam Venkatesan,et al.  Oblivious Hashing: A Stealthy Software Integrity Verification Primitive , 2002, Information Hiding.

[67]  Yufei Tao,et al.  Performance analysis of R*-trees with arbitrary node extents , 2004, IEEE Transactions on Knowledge and Data Engineering.

[68]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[69]  Manuel Blum,et al.  Software reliability via run-time result-checking , 1997, JACM.

[70]  Michael Gertz,et al.  Authentic Data Publication Over the Internet , 2003, J. Comput. Secur..

[71]  Silvio Micali,et al.  A Digital Signature Scheme Secure Against Adaptive Chosen-Message Attacks , 1988, SIAM J. Comput..

[72]  Rajeev Rastogi,et al.  Tracking set-expression cardinalities over continuous update streams , 2004, The VLDB Journal.

[73]  Gene Tsudik,et al.  Signature Bouquets: Immutability for Aggregated/Condensed Signatures , 2004, ESORICS.

[74]  Larry Carter,et al.  New Hash Functions and Their Use in Authentication and Set Equality , 1981, J. Comput. Syst. Sci..

[75]  Samuel P. Midkiff,et al.  Trust but verify: monitoring remotely executing programs for progress and correctness , 2005, PPoPP.

[76]  Hakan Hacigümüs,et al.  Providing database as a service , 2002, Proceedings 18th International Conference on Data Engineering.

[77]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[78]  Adi Shamir,et al.  A method for obtaining digital signatures and public-key cryptosystems , 1978, CACM.

[79]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[80]  Gene Tsudik,et al.  Aggregation Queries in the Database-As-a-Service Model , 2006, DBSec.

[81]  Jeffrey Considine,et al.  Approximate aggregation techniques for sensor databases , 2004, Proceedings. 20th International Conference on Data Engineering.

[82]  Rusins Freivalds,et al.  Fast Probabilistic Algorithms , 1979, MFCS.