StoreSim: Optimizing Information Leakage in Multicloud Storage Services

Many schemes have been recently advanced for storing data on multiple clouds. Distributing data over different cloud storage providers (CSPs) automatically provides users with a certain degree of information leakage control, as no single point of attack can leak all user's information. However, unplanned distribution of data chunks can lead to high information disclosure even while using multiple clouds. In this paper, to address this problem we present StoreSim, an information leakage aware storage system in multicloud. StoreSim aims to store syntactically similar data on the same cloud, thus minimizing the user's information leakage across multiple clouds. We design an approximate algorithm to efficiently generate similarity-preserving signatures for data chunks based on MinHash and Bloom filter, and also design a function to compute the information leakage based on these signatures. Next, we present an effective storage plan generation algorithm based on clustering for distributing data chunks with minimal information leakage across multiple clouds. Finally, we evaluate our scheme using two real datasets from Wikipedia and GitHub. We show that our scheme can reduce the information leakage by up to 60% compared to unplanned placement.

[1]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[2]  Jon Crowcroft On the duality of resilience and privacy† , 2015, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[3]  Karl Aberer,et al.  C3P: Context-Aware Crowdsourced Cloud Privacy , 2014, Privacy Enhancing Technologies.

[4]  Aiko Pras,et al.  Benchmarking personal cloud storage , 2013, Internet Measurement Conference.

[5]  Ethan Katz-Bassett,et al.  SPANStore: cost-effective geo-replicated storage spanning multiple cloud services , 2013, SOSP.

[6]  Monika Henzinger,et al.  Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.

[7]  Miguel Correia,et al.  DepSky: Dependable and Secure Storage in a Cloud-of-Clouds , 2013, TOS.

[8]  Ninghui Li,et al.  On the tradeoff between privacy and utility in data publishing , 2009, KDD.

[9]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[10]  Torsten Suel,et al.  Algorithms for Delta Compression and Remote File Synchronization , 2003 .

[11]  Gurmeet Singh Manku,et al.  Detecting near-duplicates for web crawling , 2007, WWW '07.

[12]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[13]  Nils Gruschka,et al.  Security and Privacy-Enhancing Multicloud Architectures , 2013, IEEE Transactions on Dependable and Secure Computing.

[14]  Antti Ylä-Jääski,et al.  Is the Same Instance Type Created Equal? Exploiting Heterogeneity of Public Clouds , 2013, IEEE Transactions on Cloud Computing.

[15]  Karl Aberer,et al.  Scalia: An adaptive scheme for efficient multi-cloud storage , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[16]  Yang Tang,et al.  NCCloud: A Network-Coding-Based Storage System in a Cloud-of-Clouds , 2014, IEEE Transactions on Computers.

[17]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[18]  Yannis E. Ioannidis,et al.  Bitmap index design and evaluation , 1998, SIGMOD '98.

[19]  Ping Li,et al.  b-Bit minwise hashing , 2009, WWW '10.

[20]  Srdjan Capkun,et al.  Home is safer than the cloud!: privacy concerns for consumer cloud storage , 2011, SOUPS.