UCDC: Unlimited Content-Defined Chunking, A File-Differing Method Apply to File-Synchronization among Multiple Hosts

Nowadays, the data centric system has been playing an increasingly important role in blogs sharing, content delivery and news broadcasting, file-synchronization, and so on. Due to generated amount of data within the system, data backup and archiving has become a main challenging task. A main methods to solve the problem is Chunking based deduplication by eliminating redundant data and reducing the total storage space. In this paper, we summarized several ways of file-differing, and then designs a new Unlimited Content-Defined Chunking (UCDC) algorithm, which contains file-chunking, file-comparing and file-merging. We evaluate the effectiveness of the UCDC by simulation example that produces the description of file.

[1]  Cezary Dubnicki,et al.  Bimodal Content Defined Chunking for Backup Streams , 2010, FAST.

[2]  Guanlin Lu,et al.  ADMAD: Application-Driven Metadata Aware De-duplication Archival Storage System , 2008, 2008 Fifth IEEE International Workshop on Storage Network Architecture and Parallel I/Os.

[3]  Dan Feng,et al.  3DNBS: A Data De-duplication Disk-Based Network Backup System , 2009, 2009 IEEE International Conference on Networking, Architecture, and Storage.

[4]  David Hung-Chang Du,et al.  Frequency Based Chunking for Data De-Duplication , 2010, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[5]  Hong Jiang,et al.  DARE: A Deduplication-Aware Resemblance Detection and Elimination Scheme for Data Reduction with Low Overheads , 2016, IEEE Transactions on Computers.