Constructing LZ78 tries and position heaps in linear time for large alphabets

We propose the first linear time algorithm to compute LZ78 trie over an integer alphabet.We propose a linear time algorithm to construct the position heap of a trie.LZ78 tries and position heaps can be superimposed on the corresponding suffix trees.Both of them can be computed by nearest marked ancestor queries on suffix trees. We present the first worst-case linear-time algorithm to compute the Lempel-Ziv 78 factorization of a given string over an integer alphabet. Our algorithm is based on nearest marked ancestor queries on the suffix tree of the given string. We also show that the same technique can be used to construct the position heap of a set of strings in worst-case linear time, when the set of strings is given as a trie.

[1]  Andrzej Ehrenfeucht,et al.  Position heaps: A simple and dynamic text indexing data structure , 2011, J. Discrete Algorithms.

[2]  Kunihiko Sadakane,et al.  Linked Dynamic Tries with Applications to LZ-Compression in Sublinear Time and Space , 2013, Algorithmica.

[3]  Fabrizio Luccio,et al.  Compressing and indexing labeled trees, with applications , 2009, JACM.

[4]  Edward R. Fiala,et al.  Data compression with finite windows , 1989, CACM.

[5]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[6]  Hideo Bannai,et al.  The Position Heap of a Trie , 2012, SPIRE.

[7]  M. Farach Optimal suffix tree construction with large alphabets , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[8]  Tetsuo Shibuya Constructing the Suffix Tree of a Tree with a Large Alphabet , 1999, ISAAC.

[9]  Hideo Bannai,et al.  Efficient LZ78 Factorization of Grammar Compressed Text , 2012, SPIRE.

[10]  Wing-Kai Hon,et al.  New Algorithms for Position Heaps , 2013, CPM.

[11]  Hideo Bannai,et al.  From Run Length Encoding to LZ78 and Back Again , 2013, 2013 Data Compression Conference.

[12]  Gad M. Landau,et al.  A Subquadratic Sequence Alignment Algorithm for Unrestricted Scoring Matrices , 2003, SIAM J. Comput..

[13]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[14]  Uzi Vishkin,et al.  Finding Level-Ancestors in Trees , 1994, J. Comput. Syst. Sci..

[15]  Michael A. Bender,et al.  The Level Ancestor Problem Simplified , 2002, LATIN.

[16]  Jeffery R. Westbrook Fast Incremental Planarity Testing , 1992, ICALP.

[17]  Ming Li,et al.  An LZ78 Based String Kernel , 2005, ADMA.

[18]  Alejandro A. Schäffer,et al.  Improved dynamic dictionary matching , 1995, SODA '93.

[19]  Dany Breslauer The suffix Tree of a Tree and Minimizing Sequential Transducers , 1996, CPM.