Analyzing Relative Lempel-Ziv Reference Construction

Relative Lempel-Ziv is a popular algorithm designed to compress sets of strings relative to a given reference string, which acts as a kind of dictionary. It can still applied even when there is no obvious natural reference string for a dataset, by sampling substrings from the dataset and concatenating them to obtain an artificial reference. This works well in practice but a theoretical analysis has been lacking. In this paper we provide such an analysis and verify it experimentally.