MMD: An Approach to Improve Reading Performance in Deduplication Systems

The approach of data deduplication has been widely used in backup systems and primary storage such as virtual machine platform. However, the reading speed in those systems suffers due to chunk fragmentation in deduplication. So it has become an important problem to improve reading performance in deduplication systems. In this paper, firstly we propose a new storage method using multiple disks to boost reading performance, which is called MMD. MMD takes advantage of the multiple parallelized disks, each of which is used as independent logical device. Then we present a deduplication model based on MMD, which focuses on optimization of data layout on disks to improve reading speed. Two I/O scheduling algorithms in that model are discussed, which aim at assigning the containers in deduplication systems to appropriate disks. Experiments show that MMD can achieve an obvious reading performance improvement than RAID in deduplication systems.

[1]  David Hung-Chang Du,et al.  Chunk Fragmentation Level: An Effective Indicator for Read Performance Degradation in Deduplication Storage , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.

[2]  John Gantz,et al.  The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East , 2012 .

[3]  Raju Rangaswami,et al.  I/O Deduplication: Utilizing content similarity to improve I/O performance , 2010, TOS.

[4]  Virginia P. Richmond,et al.  Women's Liberation in Interpersonal Relations , 1977 .

[5]  R. F. Freund,et al.  Dynamic Mapping of a Class of Independent Tasks onto Heterogeneous Computing Systems , 1999, J. Parallel Distributed Comput..

[6]  Irfan Ahmad,et al.  Decentralized Deduplication in SAN Cluster File Systems , 2009, USENIX Annual Technical Conference.

[7]  Mark Lillibridge,et al.  Improving restore speed for backup systems that use inline chunk-based deduplication , 2013, FAST.

[8]  David Fernández-Baca,et al.  Allocating Modules to Processors in a Distributed System , 1989, IEEE Trans. Software Eng..

[9]  Hong Jiang,et al.  SAR: SSD Assisted Restore Optimization for Deduplication-Based Storage Systems in the Cloud , 2012, 2012 IEEE Seventh International Conference on Networking, Architecture, and Storage.

[10]  Timothy Bisson,et al.  iDedup: latency-aware, inline data deduplication for primary storage , 2012, FAST.

[11]  Oscar H. Ibarra,et al.  Heuristic Algorithms for Scheduling Independent Tasks on Nonidentical Processors , 1977, JACM.

[12]  Jin Li,et al.  ChunkStash: Speeding Up Inline Storage Deduplication Using Flash Memory , 2010, USENIX Annual Technical Conference.

[13]  David Hung-Chang Du,et al.  Assuring Demanded Read Performance of Data Deduplication Storage with Backup Datasets , 2012, 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[14]  Yu Xiang-zhan Research on multi-objective grid task scheduling algorithms based on survivability and Makespan , 2006 .

[15]  Kai Li,et al.  Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.