3DNBS: A Data De-duplication Disk-Based Network Backup System

Traditionally, backup and archiving have been performed on tapes. With the rapid advances in disk storage technology witnessed in recent years, it becomes practical to use disks other than tape libraries as backend storage device for a backup system. For such a disk-based system, storage space efficiency is essential. Since traditional backup method cannot eliminate redundancies during backup, a new data deduplication backup technique should be developed to provide more efficient data storage at the system.This paper describes the design and performance evaluation of a data de-duplication disk-based network backup system,called 3DNBS. 3DNBS breaks files into variable sized chunks using content-defined chunking (CDC) for the purpose of duplication detection. Chunks are indexed and addressed by hashing their content, which leads to intrinsically single instance storage. Experimental results show that in comparison with traditional backup method such as Bacula, 3DNBSpresents dramatic reduction in required storage space on various workloads. By eliminating duplicated data, 3DNBS also reduces the size of data to be transmitted, hence reducing time to perform backup in a bandwidth constraint environment.

[1]  Udi Manber,et al.  Finding Similar Files in a Large File System , 1994, USENIX Winter.

[2]  Ken Thompson,et al.  Plan 9 from Bell Labs , 1995 .

[3]  Kave Eshghi,et al.  A Framework for Analyzing and Improving Content-Based Chunking Algorithms , 2005 .

[4]  Mark Lillibridge,et al.  Jumbo Store: Providing Efficient Incremental Upload and Versioning for a Utility Rendering Service , 2007, FAST.

[5]  A. Broder Some applications of Rabin’s fingerprinting method , 1993 .

[6]  Darrell D. E. Long,et al.  Deep Store: an archival storage system architecture , 2005, 21st International Conference on Data Engineering (ICDE'05).

[7]  David Mazières,et al.  A low-bandwidth network file system , 2001, SOSP.

[8]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[9]  Alfred Menezes,et al.  Handbook of Applied Cryptography , 2018 .

[10]  Suzanne L. Holcombe National Technical Information Service , 2008 .

[11]  Suresh Jagannathan,et al.  Improving duplicate elimination in storage systems , 2006, TOS.

[12]  Michael Dahlin,et al.  TAPER: tiered approach for eliminating redundancy in replica synchronization , 2005, FAST'05.

[13]  Windsor W. Hsu,et al.  Duplicate Management for Reference Data , 2004 .

[14]  Sean Quinlan,et al.  Venti: A New Approach to Archival Storage , 2002, FAST.

[15]  Brian D. Noble,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Pastiche: Making Backup Cheap and Easy , 2022 .

[16]  Timothy L. Harris,et al.  Storage, Mutability and Naming in Pasta , 2002, NETWORKING Workshops.

[17]  Marvin Theimer,et al.  Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs , 2000, SIGMETRICS '00.