A survey and comparative study of data deduplication techniques

Increase in enormous amount of digital data needs more storage space, which in turn significantly increases the cost of backup and its performance. Traditional backup solutions do not provide any inherent capability to prevent duplicate data from being backed up. Backup of duplicate data significantly increases the backup time and unnecessary consumption of resources. Data Deduplication plays an important role in eliminating this redundant data and reducing the storage consumption. Its main intend is how to reduce more duplicates efficiently, removing them at high speed and to achieve good duplicate removal ratio. Many mechanisms have been proposed to meet these objectives. It has been observed that achieving one objective may affect the other. In this paper we classify and review the existing deduplication techniques. We also present evaluation of deduplication algorithms by measuring their performance in terms of complexity and efficiency with unstructured files. We propose an efficient means to achieve high deduplication ratio with minimum backup window. We also indicate some key issues need to focus in the future work.