A Parallel Architecture for In-Line Data De-duplication

Recently, data de-duplication, the hot emerging technology, has received a broad attention from both academia and industry. Some researches focus on the approach by which more redundant data can be reduced and others investigate how to do data de-duplication at high speed. In this paper, we show the importance of data de-duplication in the current digital world and aim at reducing the time and space requirement for data de-duplication. Then, we present a parallel architecture with one node designated as a server and multiple storage nodes. All the nodes, including the server, can do block level in-line de-duplication in parallel. We have built a prototype of the system and present some performance results. The proposed system uses magnetic disks as a storage technology.

[1]  William J. Bolosky,et al.  Single Instance Storage in Windows , 2000 .

[2]  Brian D. Noble,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Pastiche: Making Backup Cheap and Easy , 2022 .

[3]  Ankur Narang,et al.  High throughput data redundancy removal algorithm with scalable performance , 2011, HiPEAC.

[4]  Lin Liu,et al.  Research on a Clustering Data De-Duplication Mechanism Based on Bloom Filter , 2010, 2010 International Conference on Multimedia Technology.

[5]  William J. Bolosky,et al.  Single instance storage in Windows® 2000 , 2000 .

[6]  Wagner Meira,et al.  A Scalable Parallel Deduplication Algorithm , 2007, 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07).

[7]  Zhanhuai Li,et al.  Data deduplication techniques , 2010, 2010 International Conference on Future Information Technology and Management Engineering.

[8]  Sean Quinlan,et al.  Venti: A New Approach to Archival Storage , 2002, FAST.