Su x Trees and their Applicationsin String Algorithms

The suux tree is a compacted trie that stores all suuxes of a given text string. This data structure has been intensively employed in pattern matching on strings and trees, with a wide range of applications, such as molecular biology, data processing, text editing, term rewriting, interpreter design, information retrieval, abstract data types and many others. In this paper, we survey some applications of suux trees and some algorithmic techniques for their construction. Special emphasis is given to the most recent developments in this area, such as parallel algorithms for suux tree construction and generalizations of suux trees to higher dimensions, which are important in multidimensional pattern matching.

[1]  F. N. Teskey Principles of text processing , 1982 .

[2]  Edward R. Fiala,et al.  Data compression with finite windows , 1989, CACM.

[3]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[4]  A. O. Slisenko,et al.  Detection of periodicities and string-matching in real time , 1983 .

[5]  Zvi Galil,et al.  Faster tree pattern matching , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[6]  Zvi Galil,et al.  An Improved Algorithm for Approximate String Matching , 1989, SIAM J. Comput..

[7]  David Haussler,et al.  Sequence landscapes , 1986, Nucleic Acids Res..

[8]  Wojciech Rytter,et al.  Parallel Construction of Minimal Suffix and Factor Automata , 1990, Inf. Process. Lett..

[9]  Peter Grassberger,et al.  Estimating the information content of symbol sequences and efficient codes , 1989, IEEE Trans. Inf. Theory.

[10]  Roberto Grossi,et al.  Parallel construction and query of suffix trees for two-dimensional matrices , 1993, SPAA '93.

[11]  Franco P. Preparata,et al.  Optimal Off-Line Detection of Repetitions in a String , 1983, Theor. Comput. Sci..

[12]  S. Rao Kosaraju,et al.  Computation of Squares in a String (Preliminary Version) , 1994, CPM.

[13]  M. Waterman Mathematical Methods for DNA Sequences , 1989 .

[14]  Ramesh Hariharan,et al.  Optimal parallel suffix tree construction , 1994, STOC '94.

[15]  Esko Ukkonen,et al.  Approximate String-Matching over Suffix Trees , 1993, CPM.

[16]  Roberto Grossi,et al.  On the Construction of Classes of Suffix Trees for Square Matrices: Algorithms and Applications , 1995, ICALP.

[17]  Aaron D. Wyner,et al.  Some asymptotic properties of the entropy of a stationary ergodic data source with applications to data compression , 1989, IEEE Trans. Inf. Theory.

[18]  Steven L. Tanimoto A method for detecting structure in polygons , 1981, Pattern Recognit..

[19]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[20]  David Haussler,et al.  Complete inverted files for efficient text retrieval and analysis , 1987, JACM.

[21]  Alejandro López-Ortiz,et al.  Linear pattern matching of repeated substrings , 1994, SIGA.

[22]  Philippe Jacquet,et al.  Autocorrelation on Words and Its Applications - Analysis of Suffix Trees by String-Ruler Approach , 1994, J. Comb. Theory A.

[23]  Abraham Lempel,et al.  On the Complexity of Finite Sequences , 1976, IEEE Trans. Inf. Theory.

[24]  Franco P. Preparata,et al.  Structural Properties of the String Statistics Problem , 1985, J. Comput. Syst. Sci..

[25]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[26]  Alberto Apostolico,et al.  The Myriad Virtues of Subword Trees , 1985 .

[27]  David Haussler,et al.  The Smallest Automaton Recognizing the Subwords of a Text , 1985, Theor. Comput. Sci..

[28]  Wojciech Szpankowski,et al.  A Generalized Suffix Tree and its (Un)expected Asymptotic Behaviors , 1993, SIAM J. Comput..

[29]  Wojciech Rytter,et al.  Parallel Construction of Minimal Suffix and Factor Automata , 1990, MFCS.

[30]  T. Snider,et al.  Suux Trees and String Complexity , 1992 .

[31]  Paolo Ferragina Incremental Text Editing: A New Data Structure , 1994, ESA.

[32]  Wojciech Plandowski,et al.  Speeding Up Two String-Matching Algorithms , 1992, STACS.

[33]  S. Rao Kosaraju Real-time pattern matching and quasi-real-time construction of suffix trees (preliminary version) , 1994, STOC '94.

[34]  Wojciech Szpankowski,et al.  Asymptotic properties of data compression and suffix trees , 1993, IEEE Trans. Inf. Theory.

[35]  Christopher W. Fraser,et al.  Analyzing and compressing assembly code , 1984, SIGPLAN '84.

[36]  Raffaele Giancarlo,et al.  The Suffix of a square matrix, with applications , 1993, SODA '93.

[37]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[38]  Dany Breslauer Dictionary-Matching on Unbounded Alphabets: Uniform Length Dictionaries , 1995, J. Algorithms.

[39]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[40]  Raffaele Giancarlo,et al.  An Index Data Structure For Matrices, with Applications to Fast Two-Dimensional Pattern Matching , 1993, WADS.

[41]  Raffaele Giancarlo,et al.  Dynamic Dictionary Matching , 1994, J. Comput. Syst. Sci..

[42]  Yossi Shiloach,et al.  Fast Canonization of Circular Strings , 1981, J. Algorithms.

[43]  Wojciech Rytter,et al.  Parallel Computations on Strings and Arrays , 1990, STACS.

[44]  Alfred V. Aho,et al.  Algorithms for Finding Patterns in Strings , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[45]  Mila E. Majster-Cederbaum,et al.  Efficient On-Line Construction and Correction of Position Trees , 1980, SIAM journal on computing (Print).

[46]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[47]  Lucas Chi Kwong Hui,et al.  Color Set Size Problem with Application to String Matching , 1992, CPM.

[48]  Gad M. Landau,et al.  Fast Parallel and Serial Approximate String Matching , 1989, J. Algorithms.

[49]  Kellogg S. Booth,et al.  Lexicographically Least Circular Substrings , 1980, Inf. Process. Lett..

[50]  Leonidas J. Guibas,et al.  Periods in Strings , 1981, J. Comb. Theory, Ser. A.

[51]  Donald R. Morrison,et al.  PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.

[52]  Richard Cole,et al.  Deterministic Coin Tossing with Applications to Optimal Parallel List Ranking , 2018, Inf. Control..

[53]  Uzi Vishkin,et al.  On Finding Lowest Common Ancestors: Simplification and Parallelization , 1988, AWOC.

[54]  Gaston H. Gonnet,et al.  Handbook Of Algorithms And Data Structures , 1984 .

[55]  David Haussler,et al.  A new distance metric on strings computable in linear time , 1988, Discret. Appl. Math..

[56]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[57]  N. S. Barnett,et al.  Private communication , 1969 .

[58]  Wojciech Szpankowski,et al.  Self-Alignments in Words and Their Applications , 1992, J. Algorithms.

[59]  Esko Ukkonen,et al.  On{line Construction of Suux Trees 1 , 1995 .

[60]  Brenda S. Baker,et al.  A theory of parameterized pattern matching: algorithms and applications , 1993, STOC.

[61]  Michael Rodeh A fast test for unique decipherability based on suffix trees , 1982, IEEE Trans. Inf. Theory.

[62]  Krzysztof Diks,et al.  Improved Deterministic Parallel Integer Sorting , 1991, Inf. Comput..

[63]  Alfonso F. Cardenas Analysis and performance of inverted data base structures , 1975, CACM.

[64]  Raffaele Giancarlo,et al.  Data structures and algorithms for approximate string matching , 1988, J. Complex..

[65]  Robert E. Tarjan,et al.  A data structure for dynamic trees , 1981, STOC '81.

[66]  Leonidas J. Guibas,et al.  String Overlaps, Pattern Matching, and Nontransitive Games , 1981, J. Comb. Theory A.

[67]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[68]  Gad M. Landau,et al.  Introducing efficient parallelism into approximate string matching and a new serial algorithm , 1986, STOC '86.

[69]  Juha Kk,et al.  Suux Cactus: a Cross between Suux Tree and Suux Array ? , 1995 .

[70]  Juha Kärkkäinen Suffix Cactus: A Cross between Suffix Tree and Suffix Array , 1995, CPM.

[71]  Tatsuya Akutsu A Linear Time Pattern Matching Algorithm Between a String and a Tree , 1993, CPM.

[72]  Ming Gu,et al.  An efficient algorithm for dynamic text indexing , 1994, SODA '94.

[73]  Minoru Ito,et al.  Polynomial-Time Algorithms for Computing Characteristic Strings , 1994, CPM.

[74]  Maxime Crochemore,et al.  Transducers and Repetitions , 1986, Theor. Comput. Sci..

[75]  J. Seiferas,et al.  Efficient and Elegant Subword-Tree Construction , 1985 .

[76]  Arnold L. Rosenberg,et al.  Rapid identification of repeated patterns in strings, trees and arrays , 1972, STOC.

[77]  Michael Rodeh,et al.  Linear Algorithm for Data Compression via String Matching , 1981, JACM.

[78]  Roberto Grossi,et al.  A fully-dynamic data structure for external substring search , 1995, STOC '95.

[79]  David Haussler,et al.  Average sizes of suffix trees and DAWGs , 1989, Discret. Appl. Math..

[80]  Azriel Rosenfeld,et al.  Digital Picture Processing , 1976 .

[81]  Uzi Vishkin,et al.  Fast String Matching with k Differences , 1988, J. Comput. Syst. Sci..

[82]  Roberto Grossi,et al.  Fast incremental text editing , 1995, SODA '95.

[83]  Amihood Amir,et al.  Two-Dimensional Dictionary Matching , 1992, Inf. Process. Lett..

[84]  Gaston H. Gonnet,et al.  New Indices for Text: Pat Trees and Pat Arrays , 1992, Information Retrieval: Data Structures & Algorithms.

[85]  Richard Cole,et al.  Parallel merge sort , 1988, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[86]  Robert E. Tarjan,et al.  Fast Algorithms for Finding Nearest Common Ancestors , 1984, SIAM J. Comput..