Secure and Reliable Data Outsourcing in Cloud Computing

The many advantages of cloud computing are increasingly attracting individuals and organizations to outsource their data from local to remote cloud servers. In addition to cloud infrastructure and platform providers, such as Amazon, Google, and Microsoft, more and more cloud application providers are emerging which are dedicated to offering more accessible and user friendly data storage services to cloud customers. It is a clear trend that cloud data outsourcing is becoming a pervasive service. Along with the widespread enthusiasm on cloud computing, however, concerns on data security with cloud data storage are arising in terms of reliability and privacy which raise as the primary obstacles to the adoption of the cloud. To address these challenging issues, this dissertation explores the problem of secure and reliable data outsourcing in cloud computing. We focus on deploying the most fundamental data services, e.g., data management and data utilization, while considering reliability and privacy assurance. The first part of this dissertation discusses secure and reliable cloud data management to guarantee the data correctness and availability, given the difficulty that data are no longer locally possessed by data owners. We design a secure cloud storage service which addresses the reliability issue with near-optimal overall performance. By allowing a third party to perform the public integrity verification, data owners are significantly released from the onerous work of periodically checking data integrity. To completely free the data owner from the burden of being online after data outsourcing, we propose an exact repair solution so that no metadata needs to be generated on the fly for the repaired data. The second part presents our privacy-preserving data utilization solutions supporting two categories of semantics – keyword search and graph query. For protecting data privacy, sensitive data has to be encrypted before outsourcing, which obsoletes traditional data utilization based on plaintext keyword search. We define and solve the challenging problem of privacy-preserving multi-keyword ranked search over encrypted data in cloud computing. We establish a set of strict privacy requirements for such a secure cloud data utilization system to become a reality. We first propose a basic idea for keyword search based on secure inner product computation, and then give two improved schemes to achieve various stringent privacy requirements in two different threat models. We also investigate some further enhancements of our ranked search mechanism, including supporting more search semantics, i.e., TF × IDF, and dynamic data operations. As a general data structure to describe the relation between entities, the graph has been increasingly used to model complicated structures and schemaless data, such as the personal social network, the relational database, XML documents and chemical compounds. In the case that these data contains sensitive information and need to be encrypted before outsourcing to the cloud, it is a very challenging task to effectively utilize such graph-structured data after encryption. We define and solve the problem of privacy-preserving query over encrypted graph-structured data in cloud computing. By utilizing the principle of filtering-and-verification, we pre-build a feature-based index to provide feature-related information about each encrypted data graph, and then choose the efficient inner product as the pruning tool to carry out the filtering procedure.

[1]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[2]  Philip S. Yu,et al.  Graph Indexing: Tree + Delta >= Graph , 2007, VLDB.

[3]  Alexandra Boldyreva,et al.  Provably-Secure Schemes for Basic Query Support in Outsourced Databases , 2007, DBSec.

[4]  Shijie Zhang,et al.  TreePi: A Novel Graph Indexing Method , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[5]  Mary Baker,et al.  Auditing to Keep Online Storage Services Honest , 2007, HotOS.

[6]  Ari Juels,et al.  Pors: proofs of retrievability for large files , 2007, CCS '07.

[7]  Cong Wang,et al.  Achieving usable and privacy-assured similarity search over outsourced cloud data , 2012, 2012 Proceedings IEEE INFOCOM.

[8]  Lucas Ballard,et al.  Achieving Efficient Conjunctive Keyword Searches over Encrypted Data , 2005, ICICS.

[9]  Alistair Moffat,et al.  Exploring the similarity space , 1998, SIGF.

[10]  Yunnan Wu,et al.  Reducing repair traffic for erasure coding-based storage via interference alignment , 2009, 2009 IEEE International Symposium on Information Theory.

[11]  Cong Wang,et al.  Privacy-preserving multi-keyword ranked search over encrypted cloud data , 2011, 2011 Proceedings IEEE INFOCOM.

[12]  Cong Wang,et al.  Secure Ranked Keyword Search over Encrypted Cloud Data , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.

[13]  Mihir Bellare,et al.  Incremental Cryptography: The Case of Hashing and Signing , 1994, CRYPTO.

[14]  Rafail Ostrovsky,et al.  Cryptography from Anonymity , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[15]  Kannan Ramchandran,et al.  Exact Regeneration Codes for Distributed Storage Repair Using Interference Alignment , 2009, ArXiv.

[16]  H. V. Jagadish,et al.  PrivatePond: Outsourced Management of Web Corpuses , 2009, WebDB.

[17]  Michael Burrows,et al.  A Cooperative Internet Backup Scheme , 2003, USENIX Annual Technical Conference, General Track.

[18]  Anne-Marie Kermarrec,et al.  LT Network Codes , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.

[19]  Yunnan Wu,et al.  A Construction of Systematic MDS Codes With Minimum Repair Bandwidth , 2009, IEEE Transactions on Information Theory.

[20]  Rafail Ostrovsky,et al.  Public Key Encryption That Allows PIR Queries , 2007, CRYPTO.

[21]  Cong Wang,et al.  Enabling Public Verifiability and Data Dynamics for Storage Security in Cloud Computing , 2009, ESORICS.

[22]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[23]  Aris M. Ouksel,et al.  Multidimensional B-trees for associative searching in database systems , 1982, Inf. Syst..

[24]  Elaine Shi,et al.  Predicate Privacy in Encryption Systems , 2009, IACR Cryptol. ePrint Arch..

[25]  Paulo S. L. M. Barreto,et al.  Demonstrating data possession and uncheatable data transfer , 2006, IACR Cryptol. ePrint Arch..

[26]  Yunnan Wu,et al.  A Survey on Network Codes for Distributed Storage , 2010, Proceedings of the IEEE.

[27]  Rafail Ostrovsky,et al.  Replication is not needed: single database, computationally-private information retrieval , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[28]  Jeroen Doumen,et al.  Using Secret Sharing for Searching in Encrypted Data , 2004, Secure Data Management.

[29]  Nikos Mamoulis,et al.  Secure kNN computation on encrypted databases , 2009, SIGMOD Conference.

[30]  Alberto Del Bimbo,et al.  Efficient Matching and Indexing of Graph Models in Content-Based Retrieval , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Robert H. Deng,et al.  Private Information Retrieval Using Trusted Hardware , 2006, IACR Cryptol. ePrint Arch..

[32]  Kannan Ramchandran,et al.  Exact Regenerating Codes for Distributed Storage , 2009, ArXiv.

[33]  Pieter H. Hartel,et al.  Towards an Information Theoretic Analysis of Searchable Encryption , 2008, ICICS.

[34]  A. Dimakis,et al.  Deterministic Regenerating Codes for Distributed Storage Yunnan , 2007 .

[35]  Ethan L. Miller,et al.  Store, Forget, and Check: Using Algebraic Signatures to Check Remotely Administered Storage , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[36]  Guan-Ming Su,et al.  Confidentiality-preserving rank-ordered search , 2007, StorageSS '07.

[37]  Jia Xu,et al.  Remote Integrity Check with Dishonest Storage Server , 2008, ESORICS.

[38]  Tsuyoshi Takagi,et al.  Efficient Conjunctive Keyword-Searchable Encryption , 2007, 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07).

[39]  Rafail Ostrovsky,et al.  Searchable symmetric encryption: improved definitions and efficient constructions , 2006, CCS '06.

[40]  Wilfred Ng,et al.  Fg-index: towards verification-free query processing on graph databases , 2007, SIGMOD '07.

[41]  Melissa Chase,et al.  Structured Encryption and Controlled Disclosure , 2010, IACR Cryptol. ePrint Arch..

[42]  Robert H. Deng,et al.  Private Query on Encrypted Data in Multi-user Settings , 2008, ISPEC.

[43]  Cong Wang,et al.  Efficient verifiable fuzzy keyword search over encrypted data in cloud computing , 2013, Comput. Sci. Inf. Syst..

[44]  Reza Curtmola,et al.  Remote data checking for network coding-based distributed storage systems , 2010, CCSW '10.

[45]  Zhenyu Yang,et al.  LT codes-based secure and reliable cloud storage service , 2012, 2012 Proceedings IEEE INFOCOM.

[46]  Salve Bhagyashri Salve Bhagyashri,et al.  Privacy-Preserving Public Auditing For Secure Cloud Storage , 2014 .

[47]  Pil Joong Lee,et al.  Public Key Encryption with Conjunctive Keyword Search and Its Extension to a Multi-user System , 2007, Pairing.

[48]  Brent Waters,et al.  Anonymous Hierarchical Identity-Based Encryption (Without Random Oracles) , 2006, CRYPTO.

[49]  Roberto Di Pietro,et al.  Scalable and efficient provable data possession , 2008, IACR Cryptol. ePrint Arch..

[50]  Ernst W. Biersack,et al.  A Practical Study of Regenerating Codes for Peer-to-Peer Backup Systems , 2009, 2009 29th IEEE International Conference on Distributed Computing Systems.

[51]  M. Mrinalni Vaknishadh,et al.  Enabling Public Auditability and Data Dynamics for Storage Security in Cloud Computing , 2012 .

[52]  Kihyun Kim,et al.  Public Key Encryption with Conjunctive Field Keyword Search , 2004, WISA.

[53]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[54]  Cong Wang,et al.  Enabling Efficient Fuzzy Keyword Search over Encrypted Data in Cloud Computing , 2009, IACR Cryptol. ePrint Arch..

[55]  Reza Curtmola,et al.  Provable data possession at untrusted stores , 2007, CCS '07.

[56]  Joonsang Baek,et al.  Public Key Encryption with Keyword Search Revisited , 2008, ICCSA.

[57]  Yunnan Wu Existence and construction of capacity-achieving network codes for distributed storage , 2009, 2009 IEEE International Symposium on Information Theory.

[58]  Elaine Shi,et al.  Multi-Dimensional Range Query over Encrypted Data , 2007, 2007 IEEE Symposium on Security and Privacy (SP '07).

[59]  Cong Wang,et al.  Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data , 2012, IEEE Transactions on Parallel and Distributed Systems.

[60]  Hovav Shacham,et al.  Compact Proofs of Retrievability , 2008, Journal of Cryptology.

[61]  Michael K. Reiter,et al.  Space-Efficient Block Storage Integrity , 2005, NDSS.

[62]  Jonathan Katz,et al.  Predicate Encryption Supporting Disjunctions, Polynomial Equations, and Inner Products , 2008, Journal of Cryptology.

[63]  Ari Juels,et al.  Proofs of retrievability: theory and implementation , 2009, CCSW '09.

[64]  Cong Wang,et al.  Ensuring data storage security in Cloud Computing , 2009, 2009 17th International Workshop on Quality of Service.

[65]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[66]  Rafail Ostrovsky,et al.  Public Key Encryption with Keyword Search , 2004, EUROCRYPT.

[67]  Wolfgang Nejdl,et al.  Zerber+R: top-k retrieval from a confidential index , 2009, EDBT '09.

[68]  Mihir Bellare,et al.  Deterministic and Efficiently Searchable Encryption , 2007, CRYPTO.

[69]  Ming Li,et al.  Authorized Private Keyword Search over Encrypted Data in Cloud Computing , 2011, 2011 31st International Conference on Distributed Computing Systems.

[70]  Jeroen Doumen,et al.  Searching in encrypted data , 2004 .

[71]  Rafail Ostrovsky,et al.  Private Searching on Streaming Data , 2005, Journal of Cryptology.

[72]  Ramakrishnan Srikant,et al.  Order preserving encryption for numeric data , 2004, SIGMOD '04.

[73]  Sebastian Nowozin,et al.  gBoost: a mathematical programming approach to graph classification and regression , 2009, Machine Learning.

[74]  Brent Waters,et al.  Secure Conjunctive Keyword Search over Encrypted Data , 2004, ACNS.

[75]  Cong Wang,et al.  Toward publicly auditable secure cloud data storage services , 2010, IEEE Network.

[76]  Michael Mitzenmacher,et al.  Privacy Preserving Keyword Searches on Remote Encrypted Data , 2005, ACNS.

[77]  Peter Williams,et al.  Building castles out of mud: practical access pattern privacy and correctness on untrusted storage , 2008, CCS.

[78]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[79]  Cong Wang,et al.  Privacy-Preserving Multi-Keyword Ranked Search over Encrypted Cloud Data , 2014 .

[80]  Cong Wang,et al.  Privacy-Preserving Public Auditing for Data Storage Security in Cloud Computing , 2010, 2010 Proceedings IEEE INFOCOM.

[81]  Tracey Ho,et al.  A Random Linear Network Coding Approach to Multicast , 2006, IEEE Transactions on Information Theory.

[82]  Allison Bishop,et al.  Fully Secure Functional Encryption: Attribute-Based Encryption and (Hierarchical) Inner Product Encryption , 2010, EUROCRYPT.

[83]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[84]  Reza Curtmola,et al.  MR-PDP: Multiple-Replica Provable Data Possession , 2008, 2008 The 28th International Conference on Distributed Computing Systems.

[85]  Daniel A. Spielman,et al.  Efficient erasure correcting codes , 2001, IEEE Trans. Inf. Theory.

[86]  Michael Luby,et al.  LT codes , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[87]  Jeffrey Xu Yu,et al.  Taming verification hardness: an efficient algorithm for testing subgraph isomorphism , 2008, Proc. VLDB Endow..

[88]  Mihir Bellare,et al.  Searchable Encryption Revisited: Consistency Properties, Relation to Anonymous IBE, and Extensions , 2005, Journal of Cryptology.

[89]  Yevgeniy Dodis,et al.  Proofs of Retrievability via Hardness Amplification , 2009, IACR Cryptol. ePrint Arch..

[90]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[91]  Elaine Shi,et al.  Evaluating predicates over encrypted data , 2008 .

[92]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[93]  Kristin E. Lauter,et al.  Cryptographic Cloud Storage , 2010, Financial Cryptography Workshops.

[94]  Huaxia Xia,et al.  RobuSTore: a distributed storage architecture with robust and high performance , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[95]  Cong Wang,et al.  Achieving Secure, Scalable, and Fine-grained Data Access Control in Cloud Computing , 2010, 2010 Proceedings IEEE INFOCOM.

[96]  Ari Juels,et al.  HAIL: a high-availability and integrity layer for cloud storage , 2009, CCS.

[97]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[98]  Chris Rose,et al.  A Break in the Clouds: Towards a Cloud Definition , 2011 .

[99]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[100]  Alexandros G. Dimakis,et al.  Searching for Minimum Storage Regenerating Codes , 2009, ArXiv.

[101]  Peter Sanders,et al.  Polynomial time algorithms for network information flow , 2003, SPAA '03.

[102]  Cong Wang,et al.  Privacy-Preserving Query over Encrypted Graph-Structured Data in Cloud Computing , 2011, 2011 31st International Conference on Distributed Computing Systems.

[103]  Cong Wang,et al.  Toward Secure and Dependable Storage Services in Cloud Computing , 2012, IEEE Transactions on Services Computing.

[104]  Marianne Winslett,et al.  Zerber: r-confidential indexing for distributed documents , 2008, EDBT '08.

[105]  Kannan Ramchandran,et al.  Explicit codes minimizing repair bandwidth for distributed storage , 2009, 2010 IEEE Information Theory Workshop on Information Theory (ITW 2010, Cairo).

[106]  Moni Naor,et al.  The Complexity of Online Memory Checking , 2005, FOCS.

[107]  Jaroslav Pokorný,et al.  Extending Fagin's Algorithm for More Users Based on Multidimensional B-Tree , 2008, ADBIS.

[108]  Nathan Chenette,et al.  Order-Preserving Symmetric Encryption , 2009, IACR Cryptol. ePrint Arch..

[109]  Dawn Xiaodong Song,et al.  Practical techniques for searches on encrypted data , 2000, Proceeding 2000 IEEE Symposium on Security and Privacy. S&P 2000.

[110]  Brent Waters,et al.  Conjunctive, Subset, and Range Queries on Encrypted Data , 2007, TCC.

[111]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[112]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[113]  Xin Wang,et al.  Tree-structured data regeneration with network coding in distributed storage systems , 2009, 2009 17th International Workshop on Quality of Service.

[114]  Pil Joong Lee,et al.  Searchable Keyword-Based Encryption , 2005, IACR Cryptol. ePrint Arch..

[115]  Radu Sion,et al.  Conjunctive Keyword Search on Encrypted Data with Completeness and Computational Privacy , 2005, IACR Cryptol. ePrint Arch..

[116]  Mary Baker,et al.  Privacy-Preserving Audit and Extraction of Digital Contents , 2008, IACR Cryptol. ePrint Arch..

[117]  Brent Waters,et al.  Building an Encrypted and Searchable Audit Log , 2004, NDSS.