Deduplication and compression techniques in cloud design

Our approach to deduplication and compression in cloud computing aims at reduction in storage space and bandwidth usage during file transfers. The design depends on multiple metadata structures for deduplication. Only a copy of the duplicate files is retained while others are deleted. The existence of duplicate files is determined from the metadata. The files are clustered into bins depending on their size. They are then segmented, deduplicated, compressed and stored. Binning restricts the number of segments and their sizes so that it is optimum for each file size. When the user requests a file, compressed segments of the file are sent over the network along with the file-to-segment mapping. These are the uncompressed and combined to create a complete file, hence minimizing bandwidth requirements.

[1]  Rao Mikkilineni,et al.  Cloud Computing and the Lessons from the Past , 2009, 2009 18th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises.

[2]  Zhanhuai Li,et al.  Data deduplication techniques , 2010, 2010 International Conference on Future Information Technology and Management Engineering.

[3]  Shmuel Tomi Klein,et al.  The design of a similarity based deduplication system , 2009, SYSTOR '09.

[4]  Hong Jiang,et al.  SAM: A Semantic-Aware Multi-tiered Source De-duplication Framework for Cloud Backup , 2010, 2010 39th International Conference on Parallel Processing.

[5]  Zhanhuai Li,et al.  Analysis of the key technology on cloud storage , 2010, 2010 International Conference on Future Information Technology and Management Engineering.

[6]  Kai Li,et al.  Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.

[7]  Anuradha Bhamidipaty,et al.  Interactive deduplication using active learning , 2002, KDD.

[8]  Benny Pinkas,et al.  Side Channels in Cloud Services: Deduplication in Cloud Storage , 2010, IEEE Security & Privacy.