A survey of practical algorithms for suffix tree construction in external memory

The construction of suffix trees in secondary storage was considered impractical due to its excessive I/O cost. Algorithms developed in the last decade show that a suffix tree can efficiently be built in secondary storage for inputs which fit the main memory. In this paper, we analyze the details of algorithmic approaches to the external memory suffix tree construction and compare the performance and scalability of existing state‐of‐the‐art software based on these algorithms. Copyright © 2010 John Wiley & Sons, Ltd.

[1]  Stefan Kurtz,et al.  REPuter: fast computation of maximal repeats in complete genomes , 1999, Bioinform..

[2]  Gonzalo Navarro,et al.  Dynamic Fully-Compressed Suffix Trees , 2008, CPM.

[3]  Paolo Ferragina,et al.  A Theoretical and Experimental Study on the Construction of Suffix Arrays in External Memory , 2001, Algorithmica.

[4]  Wolfgang Gerlach,et al.  Engineering a compressed suffix tree implementation , 2007, JEAL.

[5]  Mark Nelson,et al.  Fast string searching with suffix trees , 1996 .

[6]  VälimäkiNiko,et al.  Compressed suffix tree---a basis for genome-scale sequence analysis , 2007 .

[7]  Hiroki Arimura,et al.  Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications , 2001, CPM.

[8]  Jeffrey Scott Vitter,et al.  Algorithms for parallel memory, I: Two-level memories , 2005, Algorithmica.

[9]  Alex Thomo,et al.  A new method for indexing genomes using on-disk suffix trees , 2008, CIKM '08.

[10]  Wojciech Szpankowski,et al.  Self-Alignments in Words and Their Applications , 1992, J. Algorithms.

[11]  S. Muthukrishnan,et al.  On the sorting-complexity of suffix tree construction , 2000, JACM.

[12]  William F. Smyth,et al.  Computing Patterns in Strings , 2003 .

[13]  Donald R. Morrison,et al.  PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.

[14]  Jignesh M. Patel,et al.  Practical methods for constructing suffix trees , 2005, The VLDB Journal.

[15]  Marek J. Sergot,et al.  Distributed and Paged Suffix Trees for Large Genetic Databases , 2003, CPM.

[16]  Kunihiko Sadakane,et al.  Faster suffix sorting , 2007, Theoretical Computer Science.

[17]  Gonzalo Navarro,et al.  A Hybrid Indexing Method for Approximate String Matching , 2007 .

[18]  Malcolm P. Atkinson,et al.  A Database Index to Large Biological Sequences , 2001, VLDB.

[19]  Robert Giegerich,et al.  From Ukkonen to McCreight and Weiner: A Unifying View of Linear-Time Suffix Tree Construction , 1997, Algorithmica.

[20]  Peter Sanders,et al.  Simple Linear Work Suffix Array Construction , 2003, ICALP.

[21]  Robert Giegerich,et al.  Efficient implementation of lazy suffix trees , 1999, Softw. Pract. Exp..

[22]  Srikanta J. Bedathur,et al.  Engineering a fast online persistent suffix tree construction , 2004, Proceedings. 20th International Conference on Data Engineering.

[23]  Peter Sanders,et al.  Linear work suffix array construction , 2006, JACM.

[24]  M. V. Wilkes,et al.  The Art of Computer Programming, Volume 3, Sorting and Searching , 1974 .

[25]  S. VitterJ.,et al.  Algorithms for parallel memory, I , 1994 .

[26]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[27]  Kunihiko Sadakane,et al.  Compressed Suffix Trees with Full Functionality , 2007, Theory of Computing Systems.

[28]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[29]  Arne Andersson,et al.  Suffix Trees on Words , 1996, Algorithmica.

[30]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[31]  Raphaël Clifford Distributed suffix trees , 2005, J. Discrete Algorithms.

[32]  Gonzalo Navarro,et al.  An(other) Entropy-Bounded Compressed Suffix Tree , 2008, CPM.

[33]  Edward F. Grove,et al.  External-memory graph algorithms , 1995, SODA '95.

[34]  Giovanni Manzini,et al.  Two Space Saving Tricks for Linear Time LCP Array Computation , 2004, SWAT.

[35]  Peter Sanders,et al.  Better external memory suffix array construction , 2008, JEAL.

[36]  Mohammed J. Zaki,et al.  Genome-scale disk-based suffix tree indexing , 2007, SIGMOD '07.

[37]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures: dealing with massive data , 2001, CSUR.

[38]  S. Muthukrishnan,et al.  Optimal Logarithmic Time Randomized Suffix Tree Construction , 1996, ICALP.

[39]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.