论文信息 - A survey and comparative study of data deduplication techniques

A survey and comparative study of data deduplication techniques

Increase in enormous amount of digital data needs more storage space, which in turn significantly increases the cost of backup and its performance. Traditional backup solutions do not provide any inherent capability to prevent duplicate data from being backed up. Backup of duplicate data significantly increases the backup time and unnecessary consumption of resources. Data Deduplication plays an important role in eliminating this redundant data and reducing the storage consumption. Its main intend is how to reduce more duplicates efficiently, removing them at high speed and to achieve good duplicate removal ratio. Many mechanisms have been proposed to meet these objectives. It has been observed that achieving one objective may affect the other. In this paper we classify and review the existing deduplication techniques. We also present evaluation of deduplication algorithms by measuring their performance in terms of complexity and efficiency with unstructured files. We propose an efficient means to achieve high deduplication ratio with minimum backup window. We also indicate some key issues need to focus in the future work.

Jyoti Malhotra | Jagdish Bakal | J. Bakal | Jyoti Malhotra

[1] Young Woong Ko,et al. Two-Level Metadata Management for Data Deduplication System , 2013 .

[2] Hong Jiang,et al. Application-Aware Local-Global Source Deduplication for Cloud Backup Services of Personal Storage , 2014, IEEE Transactions on Parallel and Distributed Systems.

[3] Hong Jiang,et al. A Scalable Inline Cluster Deduplication Framework for Big Data Protection , 2012, Middleware.

[4] David D. Chambliss,et al. Mixing Deduplication and Compression on Active Data Sets , 2011, 2011 Data Compression Conference.

[5] Lin Liu,et al. Research on a Clustering Data De-Duplication Mechanism Based on Bloom Filter , 2010, 2010 International Conference on Multimedia Technology.

[6] Pin Zhou,et al. Demystifying data deduplication , 2008, Companion '08.

[7] Teng-Sheng Moh,et al. A running time improvement for the two thresholds two divisors algorithm , 2010, ACM SE '10.

[8] Anne-Marie Kermarrec,et al. Probabilistic deduplication for cluster-based storage systems , 2012, SoCC '12.

[9] Chuck Yoo,et al. Byte-index Chunking Algorithm for Data Deduplication System , 2013 .

[10] Gang Wang,et al. Adaptive pipeline for deduplication , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[11] Kai Li,et al. Tradeoffs in Scalable Data Routing for Deduplication Clusters , 2011, FAST.