论文信息 - A Parallel Architecture for In-Line Data De-duplication

A Parallel Architecture for In-Line Data De-duplication

Recently, data de-duplication, the hot emerging technology, has received a broad attention from both academia and industry. Some researches focus on the approach by which more redundant data can be reduced and others investigate how to do data de-duplication at high speed. In this paper, we show the importance of data de-duplication in the current digital world and aim at reducing the time and space requirement for data de-duplication. Then, we present a parallel architecture with one node designated as a server and multiple storage nodes. All the nodes, including the server, can do block level in-line de-duplication in parallel. We have built a prototype of the system and present some performance results. The proposed system uses magnetic disks as a storage technology.

M. Mishra | S. S. Sengar

[1] William J. Bolosky,et al. Single Instance Storage in Windows , 2000 .

[2] Brian D. Noble,et al. Proceedings of the 5th Symposium on Operating Systems Design and Implementation Pastiche: Making Backup Cheap and Easy , 2022 .

[3] Ankur Narang,et al. High throughput data redundancy removal algorithm with scalable performance , 2011, HiPEAC.

[4] Lin Liu,et al. Research on a Clustering Data De-Duplication Mechanism Based on Bloom Filter , 2010, 2010 International Conference on Multimedia Technology.

[5] William J. Bolosky,et al. Single instance storage in Windows® 2000 , 2000 .

[6] Wagner Meira,et al. A Scalable Parallel Deduplication Algorithm , 2007, 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07).

[7] Zhanhuai Li,et al. Data deduplication techniques , 2010, 2010 International Conference on Future Information Technology and Management Engineering.

[8] Sean Quinlan,et al. Venti: A New Approach to Archival Storage , 2002, FAST.