论文信息 - Differential compression method based on data de-duplication

Differential compression method based on data de-duplication

The invention discloses a differential compression method based on data de-duplication. The differential compression method includes steps of partitioning files in data flow to obtain multiple data blocks; computing data block fingerprint of each data block for searching duplicate data; grouping all the data blocks to establish data block groups and double link lists thereof; searching the fingerprint of each data block in each data block group for realizing data de-duplication so as to determine whether the data block is duplicated or not; searching similar data locally to the data block group which is subjected to the data de-duplication process according to the duplicated data information in the double link lists of the data block groups, namely, determining the non-duplicated data blocks adjacent to the duplicated data blocks as potential similar data blocks; verifying the similarity of the similar data blocks by differential compression; and finally complementarily searching similarity data to the data block groups according to the similarity. The differential compression method based on data de-duplication has the advantages of rapidness in similar data searching, low computing and indexing overhead and high data compression efficiency.

冯丹 | 田磊 | 夏文 | 江泓 | 付忞