On the Size of Lempel-Ziv and Lyndon Factorizations

Lyndon factorization and Lempel-Ziv (LZ) factorization are both important tools for analysing the structure and complexity of strings, but their combinatorial structure is very different. In this paper, we establish the first direct connection between the two by showing that while the Lyndon factorization can be bigger than the non-overlapping LZ factorization (which we demonstrate by describing a new, non-trivial family of strings) it is always less than twice the size.

[1]  Hideo Bannai,et al.  Faster Lyndon Factorization Algorithms for SLP and LZ78 Compressed Text , 2013, SPIRE.

[2]  Jacques-Olivier Lachaud,et al.  Lyndon + Christoffel = digitally convex , 2009, Pattern Recognit..

[3]  Yasuo Tabei,et al.  Queries on LZ-Bounded Encodings , 2014, 2015 Data Compression Conference.

[4]  R. Lyndon,et al.  Free Differential Calculus, IV. The Quotient Groups of the Lower Central Series , 1958 .

[5]  Wojciech Rytter Application of Lempel-Ziv factorization to the approximation of grammar-based compression , 2003, Theor. Comput. Sci..

[6]  Shunsuke Inenaga,et al.  Diverse Palindromic Factorization is NP-Complete , 2018, Int. J. Found. Comput. Sci..

[7]  Manfred Kufleitner On Bijective Variants of the Burrows-Wheeler Transform , 2009, Stringology.

[8]  Marcin Mucha,et al.  Lyndon Words and Short Superstrings , 2012, SODA.

[9]  Juha Kärkkäinen,et al.  A subquadratic algorithm for minimum palindromic factorization , 2014, J. Discrete Algorithms.

[10]  Justin Zobel,et al.  Relative Lempel-Ziv Factorization for Efficient Storage and Retrieval of Web Collections , 2011, Proc. VLDB Endow..

[11]  Arun Ram,et al.  Standard Lyndon bases of Lie algebras and enveloping algebras , 1995 .

[12]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[13]  Joseph Gil,et al.  A Bijective String Sorting Transform , 2012, ArXiv.

[14]  Gregory Kucherov,et al.  Finding maximal repetitions in a word in linear time , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[15]  Abhi Shelat,et al.  The smallest grammar problem , 2005, IEEE Transactions on Information Theory.

[16]  Juha Kärkkäinen,et al.  LZ77-Based Self-indexing with Faster Pattern Matching , 2014, LATIN.

[17]  M. Lothaire,et al.  Combinatorics on words: Frontmatter , 1997 .

[18]  George Melvin,et al.  Representations of quiver Hecke algebras via Lyndon bases , 2009, 0912.2067.

[19]  Franck Petit,et al.  Circle formation of weak robots and Lyndon words , 2006, Inf. Process. Lett..

[20]  Marc Chemillier Periodic musical sequences and Lyndon words , 2004, Soft Comput..

[21]  Hideo Bannai,et al.  Factorizing a String into Squares in Linear Time , 2016, CPM.

[22]  Antonio Restivo,et al.  Suffix array and Lyndon factorization of a text , 2014, J. Discrete Algorithms.

[23]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[24]  Kazuya Tsuruta,et al.  A new characterization of maximal repetitions by Lyndon trees , 2015, SODA.