A survey of practical algorithms for suffix tree construction in external memory

The construction of suffix trees in secondary storage was considered impractical due to its excessive I-O cost. Algorithms developed in the last decade show that a suffix tree can efficiently be built in secondary storage for inputs which fit the main memory. In this paper, we analyze the details of algorithmic approaches to the external memory suffix tree construction and compare the performance and scalability of existing state-of-the-art software based on these algorithms. Copyright © 2010 John Wiley & Sons, Ltd.

[1]  Malcolm P. Atkinson,et al.  A Database Index to Large Biological Sequences , 2001, VLDB.

[2]  Robert Giegerich,et al.  From Ukkonen to McCreight and Weiner: A Unifying View of Linear-Time Suffix Tree Construction , 1997, Algorithmica.

[3]  Hiroki Arimura,et al.  Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications , 2001, CPM.

[4]  Donald R. Morrison,et al.  PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.

[5]  Arne Andersson,et al.  Suffix Trees on Words , 1996, CPM.

[6]  Mohammed J. Zaki,et al.  Genome-scale disk-based suffix tree indexing , 2007, SIGMOD '07.

[7]  Wolfgang Gerlach,et al.  Engineering a Compressed Suffix Tree Implementation , 2007, WEA.

[8]  Peter Sanders,et al.  Linear work suffix array construction , 2006, JACM.

[9]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[10]  Raphaël Clifford Distributed suffix trees , 2005, J. Discrete Algorithms.

[11]  Robert Giegerich,et al.  Efficient implementation of lazy suffix trees , 2003, Softw. Pract. Exp..

[12]  Jignesh M. Patel,et al.  Practical methods for constructing suffix trees , 2005, The VLDB Journal.

[13]  Srikanta J. Bedathur,et al.  Engineering a fast online persistent suffix tree construction , 2004, Proceedings. 20th International Conference on Data Engineering.

[14]  Alex Thomo,et al.  A new method for indexing genomes using on-disk suffix trees , 2008, CIKM '08.

[15]  Giovanni Manzini,et al.  Two Space Saving Tricks for Linear Time LCP Array Computation , 2004, SWAT.

[16]  Peter Sanders,et al.  Better external memory suffix array construction , 2008, JEAL.

[17]  S. Muthukrishnan,et al.  On the sorting-complexity of suffix tree construction , 2000, JACM.

[18]  Marek J. Sergot,et al.  Distributed and Paged Suffix Trees for Large Genetic Databases , 2003, CPM.

[19]  Peter Sanders,et al.  Simple Linear Work Suffix Array Construction , 2003, ICALP.

[20]  Paolo Ferragina,et al.  A Theoretical and Experimental Study on the Construction of Suffix Arrays in External Memory , 2001, Algorithmica.

[21]  Gonzalo Navarro,et al.  An(other) Entropy-Bounded Compressed Suffix Tree , 2008, CPM.

[22]  Stefan Kurtz,et al.  REPuter: fast computation of maximal repeats in complete genomes , 1999, Bioinform..

[23]  Wolfgang Gerlach,et al.  Compressed suffix tree - a basis for genome-scale sequence analysis , 2007, Bioinform..

[24]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures: dealing with massive data , 2001, CSUR.

[25]  S. Muthukrishnan,et al.  Optimal Logarithmic Time Randomized Suffix Tree Construction , 1996, ICALP.

[26]  Kunihiko Sadakane,et al.  Compressed Suffix Trees with Full Functionality , 2007, Theory of Computing Systems.

[27]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[28]  Wojciech Szpankowski,et al.  Self-Alignments in Words and Their Applications , 1992, J. Algorithms.

[29]  Dan Gusfield Algorithms on Strings, Trees, and Sequences: First Applications of Suffix Trees , 1997 .

[30]  Gonzalo Navarro,et al.  Dynamic Fully-Compressed Suffix Trees , 2008, CPM.

[31]  Jeffrey Scott Vitter,et al.  Algorithms for parallel memory, I: Two-level memories , 2005, Algorithmica.

[32]  Gonzalo Navarro,et al.  A Hybrid Indexing Method for Approximate String Matching , 2007 .

[33]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[34]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.