Dictionary-symbolwise flexible parsing

Linear-time optimal parsing algorithms are rare in the dictionary-based branch of the data compression theory. A recent result is the Flexible Parsing algorithm of Matias and Sahinalp (1999) that works when the dictionary is prefix closed and the encoding of dictionary pointers has a constant cost. We present the Dictionary-Symbolwise Flexible Parsing algorithm that is optimal for prefix-closed dictionaries and any symbolwise compressor under some natural hypothesis. In the case of LZ78-like algorithms with variable costs and any, linear as usual, symbolwise compressor we show how to implement our parsing algorithm in linear time. In the case of LZ77-like dictionaries and any symbolwise compressor our algorithm can be implemented in O(nlogn) time. We further present some experimental results that show the effectiveness of the dictionary-symbolwise approach.

[1]  G SzymanskiThomas,et al.  Data compression via textual substitution , 1982 .

[2]  Guoan Bi,et al.  Performance of selection diversity reception in correlated Rayleigh fading channels , 1998 .

[3]  Alan Hartman,et al.  Optimal Parsing of Strings , 1985 .

[4]  Antonio Restivo,et al.  Dictionary-Symbolwise Flexible Parsing , 2010, IWOCA.

[5]  Peter M. Fenwick Symbol Ranking Text Compression with Shannon Recodings , 1997, J. Univers. Comput. Sci..

[6]  Shmuel Tomi Klein,et al.  Efficient Optimal Recompression , 1997, Comput. J..

[7]  R. Nigel Horspool The effect of non-greedy parsing in Ziv-Lempel compression methods , 1995, Proceedings DCC '95 Data Compression Conference.

[8]  Yossi Matias,et al.  On the optimality of parsing in dynamic dictionary based data compression , 1999, SODA '99.

[9]  Maxime Crochemore,et al.  Pattern-matching and text-compression algorithms , 1996, CSUR.

[10]  H. S. Heaps,et al.  A comparison of algorithms for data base compression by use of fragments as language elements , 1974, Inf. Storage Retr..

[11]  Martin Cohn,et al.  Parsing with Prefix and Suffix Dictionaries. , 1996, DCC 1996.

[12]  Yossi Matias,et al.  The Effect of Flexible Parsing for Dynamic Dictionary-Based Data Compression , 2001, JEAL.

[13]  Martin Cohn,et al.  Parsing with suffix and prefix dictionaries , 1996, Proceedings of Data Compression Conference - DCC '96.

[14]  Tae Young Kim,et al.  On-line optimal parsing in adaptive dictionary-based coding , 1998 .

[15]  Ian H. Witten,et al.  The relationship between greedy parsing and symbolwise text compression , 1994, JACM.

[16]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[17]  David Salomon,et al.  Data Compression: The Complete Reference , 2006 .

[18]  Paolo Ferragina,et al.  On the Bit-Complexity of Lempel-Ziv Compression , 2009, SIAM J. Comput..

[19]  Lucian Ilie,et al.  Computing Longest Previous Factor in linear time and applications , 2008, Inf. Process. Lett..

[20]  Paolo Ferragina,et al.  Text Compression , 2009, Encyclopedia of Database Systems.

[21]  Jyrki Katajainen,et al.  An analysis of the longest match and the greedy heuristics in text encoding , 1992, JACM.

[22]  Jyrki Katajainen,et al.  An Approximation Algorithm for Space-Optimal Encoding of a Text , 1989, Comput. J..

[23]  Chiara Epifanio,et al.  On the Suffix Automaton with Mismatches , 2007, CIAA.

[24]  Robert A. Wagner,et al.  Common phrases and minimum-space text storage , 1973, CACM.

[25]  Chiara Epifanio,et al.  Languages with Mismatches and an Application to Approximate Indexing , 2005, Developments in Language Theory.

[26]  M. Waterman,et al.  A Phase Transition for the Score in Matching Random Sequences Allowing Deletions , 1994 .

[27]  James A. Storer,et al.  Data compression via textual substitution , 1982, JACM.

[28]  W. Szpankowski Average Case Analysis of Algorithms on Sequences , 2001 .

[29]  F. Mignosi,et al.  Optimal Parsing in Dictionary-Symbolwise Data Compression Schemes , 2006 .

[30]  Antonio Restivo,et al.  Languages with mismatches , 2007, Theor. Comput. Sci..

[31]  Antonio Restivo,et al.  Indexing Structures for Approximate String Matching , 2003, CIAC.

[32]  M. Waterman,et al.  THE ERDOS-RENYI STRONG LAW FOR PATTERN MATCHING WITH A GIVEN PROPORTION OF MISMATCHES , 1989 .

[33]  Giovanni Manzini,et al.  Compression of Low Entropy Strings with Lempel-Ziv Algorithms , 1999, SIAM J. Comput..