Towards Fast De-duplication Using Low Energy Coprocessor

Backup technology based on data de-duplication has become a hot topic in nowadays. In order to get a better performance, traditional research is mainly focused on decreasing the disk access time. In this paper, we consider computing complexity problem in data de-duplication system, and try to improve system performance by reducing computing time. We put computing tasks on commodity coprocessor to speed up the computing process. Compared with general-purpose processors, commodity coprocessors have lower energy consumption and lower cost. Experimental results show that they have equal or even better performance compared with general-purpose processors.

[1]  Mark Lillibridge,et al.  Extreme Binning: Scalable, parallel deduplication for chunk-based file backup , 2009, 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems.

[2]  Sean Quinlan,et al.  Venti: A New Approach to Archival Storage , 2002, FAST.

[3]  Kai Li,et al.  Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.

[4]  Hong Jiang,et al.  DEBAR: A scalable high-performance de-duplication storage system for backup and archiving , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[5]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[6]  Qing Yang,et al.  TRAP-Array: A Disk Array Architecture Providing Timely Recovery to Any Point-in-time , 2006, ISCA 2006.

[7]  Ke Zhou,et al.  TSPSCDP: A Time-Stamp Continuous Data Protection Approach Based on Pipeline Strategy , 2008, 2008 Japan-China Joint Workshop on Frontier of Computer Science and Technology.

[8]  Xu Li,et al.  Optimal Implementation of Continuous Data Protection (CDP) in Linux Kernel , 2008, 2008 International Conference on Networking, Architecture, and Storage.

[9]  Paula Ta-Shma,et al.  Architectures for Controller Based CDP , 2007, FAST.

[10]  Tzi-cker Chiueh,et al.  An Incremental File System Consistency Checker for Block-Level CDP Systems , 2008, 2008 Symposium on Reliable Distributed Systems.

[11]  Mark Lillibridge,et al.  Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality , 2009, FAST.