论文信息 - Efficient multi-resolution compression algorithm for disk-based backup and recovery

Efficient multi-resolution compression algorithm for disk-based backup and recovery

In this paper, we deal with the problem of improving backup and recovery performance by compressing redundancies in large disk-based backup system. We analyze some general compression algorithms; evaluate their scalability and applicability. We investigate the distribution features of the redundant data in whole system range, and propose a multi-resolution distributed compression algorithm which can discern duplicated data at granularity of file level, block level or byte level to reduce the redundancy in backup environment. In order to accelerate recovery, we propose a synthetic backup solution which stores data in a recovery-oriented way and can compose the final data in back-end backup server. Experiments show that this algorithm can greatly reduce bandwidth consumption, save storage cost, and shorten the backup and recovery time. We implement these technologies in our product, called H-info backup system, which is capable of achieving over 10x compression ratio in both network utilization and data storage during backup.

Wang Lina | Wang Hui | Wang Dejun

[1] Andrew Tridgell,et al. Efficient Algorithms for Sorting and Synchronization , 1999 .

[2] Brian D. Noble,et al. Proceedings of the 5th Symposium on Operating Systems Design and Implementation Pastiche: Making Backup Cheap and Easy , 2022 .

[3] Abraham Lempel,et al. A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[4] Darrell D. E. Long,et al. Deep Store: an archival storage system architecture , 2005, 21st International Conference on Data Engineering (ICDE'05).

[5] William J. Bolosky,et al. Single instance storage in Windows® 2000 , 2000 .

[6] Fred Douglis,et al. Redundancy Elimination Within Large Collections of Files , 2004, USENIX Annual Technical Conference, General Track.

[7] Ronald Fagin,et al. Compactly encoding unstructured inputs with differential compression , 2002, JACM.

[8] Walter F. Tichy,et al. Delta algorithms: an empirical analysis , 1998, TSEM.

[9] Torsten Suel,et al. Algorithms for Delta Compression and Remote File Synchronization , 2003 .