A Lossy Data Compression Based on String Matching: Preliminary Analysis and Suboptimal Algorithms

A practical suboptimal algorithm (source coding) for lossy (non-faithful) data compression is discussed. This scheme is based on an approximate string matching, and it naturally extends lossless (faithful) Lempel-Ziv data compression scheme. The construction of the algorithm is based on a careful probabilistic analysis of an approximate string matching problem that is of its own interest. This extends Wyner-Ziv model to lossy environment. In this conference version, we consider only Bernoulli model (i.e., memoryless channel) but our results hold under much weaker probabilistic assumptions.

[1]  Philippe Jacquet,et al.  Autocorrelation on Words and Its Applications - Analysis of Suffix Trees by String-Ruler Approach , 1994, J. Comb. Theory A.

[2]  Eugene L. Lawler,et al.  Approximate string matching in sublinear expected time , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[3]  Wojciech Szpankowski,et al.  Asymptotic properties of data compression and suffix trees , 1993, IEEE Trans. Inf. Theory.

[4]  Toby Berger,et al.  Rate distortion theory : a mathematical basis for data compression , 1971 .

[5]  Aaron D. Wyner,et al.  Some asymptotic properties of the entropy of a stationary ergodic data source with applications to data compression , 1989, IEEE Trans. Inf. Theory.

[6]  Alfred V. Aho,et al.  Algorithms for Finding Patterns in Strings , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[7]  B. Pittel Asymptotical Growth of a Class of Random Trees , 1985 .

[8]  Yossef Steinberg,et al.  An algorithm for source coding subject to a fidelity criterion, based on string matching , 1993, IEEE Trans. Inf. Theory.

[9]  M. Waterman,et al.  THE ERDOS-RENYI STRONG LAW FOR PATTERN MATCHING WITH A GIVEN PROPORTION OF MISMATCHES , 1989 .

[10]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[11]  M. Waterman,et al.  The Erdos-Renyi Law in Distribution, for Coin Tossing and Sequence Matching , 1990 .

[12]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[13]  J. Feldman,et al.  r-entropy, equipartition, and Ornstein’s isomorphism theorem inRn , 1980 .

[14]  P. Billingsley,et al.  Convergence of Probability Measures , 1969 .

[15]  John C. Kieffer Strong converses in source coding relative to a fidelity criterion , 1991, IEEE Trans. Inf. Theory.

[16]  Wojciech Szpankowski,et al.  A Generalized Suffix Tree and its (Un)expected Asymptotic Behaviors , 1993, SIAM J. Comput..

[17]  D. Ornstein,et al.  Universal Almost Sure Data Compression , 1990 .

[18]  Benjamin Weiss,et al.  Entropy and data compression schemes , 1993, IEEE Trans. Inf. Theory.

[19]  J. Van Leeuwen,et al.  Handbook of theoretical computer science - Part A: Algorithms and complexity; Part B: Formal models and semantics , 1990 .

[20]  John C. Kieffer,et al.  Sample converses in source coding theory , 1991, IEEE Trans. Inf. Theory.

[21]  Philippe Jacquet,et al.  Pattern Matching With Mismatches: A Probabilistic Analysis and a Randomized Algorithm (Extended Abstract) , 1992, CPM.

[22]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[23]  Abraham Lempel,et al.  On the Complexity of Finite Sequences , 1976, IEEE Trans. Inf. Theory.