Inferring Strings from Lyndon Factorization

The Lyndon factorization of a string w is a unique factorization \(\ell_1^{p_1}, \ldots, \ell_m^{p_m}\) of w s.t. l1, …, l m is a sequence of Lyndon words that is monotonically decreasing in lexicographic order. In this paper, we consider the reverse-engineering problem on Lyndon factorization: Given a sequence S = ((s 1, p 1), …, (s m , p m )) of ordered pairs of positive integers, find a string w whose Lyndon factorization corresponds to the input sequence S, i.e., the Lyndon factorization of w is in a form of \(\ell_1^{p_1}, \ldots, \ell_m^{p_m}\) with |l i | = s i for all 1 ≤ i ≤ m. Firstly, we show that there exists a simple O(n)-time algorithm if the size of the alphabet is unbounded, where n is the length of the output string. Secondly, we present an O(n)-time algorithm to compute a string over an alphabet of the smallest size. Thirdly, we show how to compute only the size of the smallest alphabet in O(m) time. Fourthly, we give an O(m)-time algorithm to compute an O(m)-size representation of a string over an alphabet of the smallest size. Finally, we propose an efficient algorithm to enumerate all strings whose Lyndon factorizations correspond to S.

[1]  Jean Pierre Duval,et al.  Factorizing Words over an Ordered Alphabet , 1983, J. Algorithms.

[2]  Hideo Bannai,et al.  Counting and Verifying Maximal Palindromes , 2010, SPIRE.

[3]  Hideo Bannai,et al.  Efficient Lyndon Factorization of Grammar Compressed Text , 2013, CPM.

[4]  Eric Rivals,et al.  STAR: an algorithm to Search for Tandem Approximate Repeats , 2004, Bioinform..

[5]  Guang Yang,et al.  Reversing Longest Previous Factor Tables is Hard , 2011, WADS.

[6]  Artur Jez,et al.  Validating the Knuth-Morris-Pratt Failure Function, Fast and Online , 2010, Theory of Computing Systems.

[7]  Arnaud Lefebvre,et al.  Border Array on Bounded Alphabet , 2002, Stringology.

[8]  Maxime Crochemore,et al.  Two-way string-matching , 1991, JACM.

[9]  R. Lyndon On Burnside’s problem , 1954 .

[10]  Jens Stoye,et al.  Counting suffix arrays and strings , 2005, Theor. Comput. Sci..

[11]  Arnaud Lefebvre,et al.  Words over an ordered alphabet and suffix permutations , 2002, RAIRO Theor. Informatics Appl..

[12]  Hideo Bannai,et al.  Faster Lyndon Factorization Algorithms for SLP and LZ78 Compressed Text , 2013, SPIRE.

[13]  Arnaud Lefebvre,et al.  Efficient validation and construction of border arrays and validation of string matching automata , 2009, RAIRO Theor. Informatics Appl..

[14]  W. F. Smyth,et al.  Verifying a border array in linear time , 1999 .

[15]  Jean-Pierre Duval,et al.  Generation of a section of conjugation classes and Lyndon word tree of limited length , 1988 .

[16]  William F. Smyth,et al.  Counting Distinct Strings , 1999, Algorithmica.

[17]  Costas S. Iliopoulos,et al.  Parallel RAM Algorithms for Factorizing Words , 1994, Theor. Comput. Sci..

[18]  Marc Chemillier Periodic musical sequences and Lyndon words , 2004, Soft Comput..

[19]  Hideo Bannai,et al.  Inferring strings from suffix trees and links on a binary alphabet , 2011, Discret. Appl. Math..

[20]  Ayumi Shinohara,et al.  Inferring Strings from Runs , 2010, Stringology.

[21]  Jacques-Olivier Lachaud,et al.  Lyndon + Christoffel = digitally convex , 2009, Pattern Recognit..

[22]  Joseph Gil,et al.  A Bijective String Sorting Transform , 2012, ArXiv.

[23]  Ayumi Shinohara,et al.  Inferring Strings from Graphs and Arrays , 2003, MFCS.

[24]  R. Lyndon,et al.  Free Differential Calculus, IV. The Quotient Groups of the Lower Central Series , 1958 .

[25]  Maxime Crochemore,et al.  Fast parallel Lyndon factorization with applications , 1995, Mathematical systems theory.

[26]  Manfred Kufleitner On Bijective Variants of the Burrows-Wheeler Transform , 2009, Stringology.