The macro model for data compression (Extended Abstract)

A general model for data compression is presented which includes most data compression systems in the literature as special cases. All macro schemes are based on the principle of finding redundant strings or patterns and replacing them by pointers to a common copy. Different varieties of macro schemes may be defined by varying the interpretation of pointers, for instance, a pointer may indicate a substring of the compressed string, a substring of the original string, or a substring of some other string such as an external dictionary. Other varieties of macros schemes may be defined by restricting the type of overlapping or recursion that may be used. Trade-offs between different varieties of macro schemes, exact lower bounds on the amount of compression obtainable, and the complexity of encoding and decoding are discussed as well as how the work of other authors (such as Lempel-Ziv) relates to this model.