Abstract This chapter gives the history, concept, and background of various data deduplication approaches. The significance of data deduplication is shown by calling attention to data blast and an enormous measure of redundancies. We explain the issues of existing solutions regarding data redundancies removal. We present data optimization from client to server through the network. We also explore the concept of various types of storage along with their differentiation viz. redundant arrays of independent disks, direct attached storage, storage area network, and network attached storage. We categorize different deduplication approaches based on techniques identified with granularity and explained file level, block-level (fixed and variable), and content aware deduplication. In view of the deduplication location, source, and target deduplication are clarified. We have presented inline and postprocess methods of deduplication which are based on the deduplication process. We have focused on comparison of compression with data deduplication. Also we discussed about challenges in data deduplication.
[1]
David Geer.
Reducing the Storage Burden via Data Deduplication
,
2008,
Computer.
[2]
Philippe Flajolet,et al.
On the Analysis of Linear Probing Hashing
,
1998,
Algorithmica.
[3]
David Wetherall,et al.
A protocol-independent technique for eliminating redundant network traffic
,
2000,
SIGCOMM 2000.
[4]
James O'Reilly.
Chapter 11 – Data Integrity
,
2017
.
[5]
Chin-Hsien Wu,et al.
A data de-duplication access framework for solid state drives
,
2011,
SAC '11.
[6]
Dutch T. Meyer,et al.
A study of practical deduplication
,
2011,
TOS.
[7]
George S. Lueker,et al.
More analysis of double hashing
,
1993,
Comb..
[8]
Randy H. Katz,et al.
A case for redundant arrays of inexpensive disks (RAID)
,
1988,
SIGMOD '88.
[9]
Jyoti Malhotra,et al.
A survey and comparative study of data deduplication techniques
,
2015,
2015 International Conference on Pervasive Computing (ICPC).