A Framework for Analyzing and Improving Content-Based Chunking Algorithms

We present a framework for analyzing contentbased chunking algorithms, as used for example in the Low Bandwidth Networked File System. We use this framework for the evaluation of the basic sliding window algorithm, and its two known variants. We develop a new chunking algorithm that performs signi cantly better than the known algorithms on real, non-random data.