A Data Preparation Approach for Cloud Storage Based on Containerized Parallel Patterns

In this paper, we present the design, implementation, and evaluation of an efficient data preparation and retrieval approach for cloud storage. The approach includes a deduplication subsystem that indexes the hash of each content to identify duplicated data. As a consequence, avoiding duplicated content reduces reprocessing time during uploads and other costs related to outsource data management tasks. Our proposed data preparation scheme enables organizations to add properties such as security, reliability, and cost-efficiency to their contents before sending them to the cloud. It also creates recovery schemes for organizations to share preprocessed contents with partners and end-users. The approach also includes an engine that encapsulates preprocessing applications into virtual containers (VCs) to create parallel patterns that improve the efficiency of data preparation retrieval process. In a study case, real repositories of satellite images, and organizational files were prepared to be migrated to the cloud by using processes such as compression, encryption, encoding for fault tolerance, and access control. The experimental evaluation revealed the feasibility of using a data preparation approach for organizations to mitigate risks that still could arise in the cloud. It also revealed the efficiency of the deduplication process to reduce data preparation tasks and the efficacy of parallel patterns to improve the end-user service experience.

[1]  Morris J. Dworkin,et al.  SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions , 2015 .

[2]  André Brinkmann,et al.  dedupv1: Improving deduplication throughput using solid state drives (SSD) , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[3]  Yonggang Wen,et al.  Private data deduplication protocols in cloud storage , 2012, SAC '12.

[4]  José Luis González,et al.  Sacbe: A building block approach for constructing efficient and flexible end-to-end cloud storage , 2018, J. Syst. Softw..

[5]  Dante D. Sánchez-Gallegos,et al.  A containerized service for clustering and categorization of weather records in the cloud , 2018, 2018 8th International Conference on Computer Science and Information Technology (CSIT).

[6]  Kakali Chatterjee,et al.  Cloud security issues and challenges: A survey , 2017, J. Netw. Comput. Appl..

[7]  André Brinkmann,et al.  Multi-level comparison of data deduplication in a backup scenario , 2009, SYSTOR '09.

[8]  Michael O. Rabin,et al.  Efficient dispersal of information for security, load balancing, and fault tolerance , 1989, JACM.

[9]  Miguel Morales-Sandoval,et al.  A data integrity verification service for cloud storage based on building blocks , 2018, 2018 8th International Conference on Computer Science and Information Technology (CSIT).

[10]  Zhenfeng Zhang,et al.  Secure and Efficient Data-Sharing in Clouds , 2013, 2013 Fourth International Conference on Emerging Intelligent Data and Web Technologies.

[11]  Jesús Carretero,et al.  SkyCDS: A resilient content delivery service based on diversified cloud storage , 2015, Simul. Model. Pract. Theory.

[12]  Xinwen Zhang,et al.  CloudSeal: End-to-End Content Protection in Cloud-Based Storage and Delivery Services , 2011, SecureComm.

[13]  Markus Jakobsson,et al.  Controlling data in the cloud: outsourcing computation without outsourcing control , 2009, CCSW '09.

[14]  Hong Jiang,et al.  Improving Storage Availability in Cloud-of-Clouds with Hybrid Redundant Data Distribution , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[15]  John Gantz,et al.  The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East , 2012 .

[16]  Michael Mitzenmacher,et al.  The Power of Two Choices in Randomized Load Balancing , 2001, IEEE Trans. Parallel Distributed Syst..

[17]  Miguel Morales-Sandoval,et al.  A pairing-based cryptographic approach for data security in the cloud , 2017, International Journal of Information Security.