Ecient Data Deduplication System Considering File Modication Pattern

In a data deduplication system, the performance of data deduplication algorithms are varying on the condition of le contents. For example, if a le is modied at the end of le region then Fixed-length Chunking algorithm superior to Variable-length Chunking in terms of computation time with similar space reduction result. Therefore, it is important to predict in which location of a le is modied in a deduplication system. In this paper, we discuss a new approach to one of the key methods that is invariably applied to data deduplication. The essential idea is to exploit an ecient le pattern checking scheme that can be used for data deduplication. The contribution of this paper is to nd in which region of a le is modied using le similarity information. The le modication pattern can be used for elaborating data deduplication system for selecting deduplication algorithm. Experiment result shows that the proposed system can predict le modication region with high probability.