Data deduplication concepts

Abstract This chapter gives the history, concept, and background of various data deduplication approaches. The significance of data deduplication is shown by calling attention to data blast and an enormous measure of redundancies. We explain the issues of existing solutions regarding data redundancies removal. We present data optimization from client to server through the network. We also explore the concept of various types of storage along with their differentiation viz. redundant arrays of independent disks, direct attached storage, storage area network, and network attached storage. We categorize different deduplication approaches based on techniques identified with granularity and explained file level, block-level (fixed and variable), and content aware deduplication. In view of the deduplication location, source, and target deduplication are clarified. We have presented inline and postprocess methods of deduplication which are based on the deduplication process. We have focused on comparison of compression with data deduplication. Also we discussed about challenges in data deduplication.