On-line construction of compact directed acyclic word graphs

Many different index structures, providing efficient solutions to problems related to pattern matching, have been introduced so far. Examples of these structures are suffix trees and directed acyclic word graphs (DAWGs), which can be efficiently constructed in linear time and space. Compact directed acyclic word graphs (CDAWGs) are an index structure preserving some features of both suffix trees and DAWGs, and require less space than both of them. An algorithm which directly constructs CDAWGs in linear time and space was first introduced by Crochemore and Verin, based on McCreight's algorithm for constructing suffix trees. In this work, we present a novel on-line linear-time algorithm that builds the CDAWG for a single string as well as for a set of strings, inspired by Ukkonen's on-line algorithm for constructing suffix trees.

[1]  Veli Mäkinen Compact Suffix Array - A Space-Efficient Full-Text Index , 2003, Fundam. Informaticae.

[2]  Giancarlo Mauri,et al.  On-Line Construction of Compact Directed Acyclic Word Graphs , 2005, CPM.

[3]  Zvi Galil,et al.  Proceedings of the 30th IEEE symposium on Foundations of computer science , 1994, FOCS 1994.

[4]  John G. Cleary,et al.  Unbounded Length Contexts for PPM , 1997 .

[5]  Ayumi Shinohara,et al.  Compact Directed Acyclic Word Graphs for a Sliding Window , 2002, SPIRE.

[6]  Shunsuke Inenaga Bidirectional Construction of Suffix Trees , 2003, Nord. J. Comput..

[7]  Robert Giegerich,et al.  From Ukkonen to McCreight and Weiner: A Unifying View of Linear-Time Suffix Tree Construction , 1997, Algorithmica.

[8]  David Haussler,et al.  The Smallest Automaton Recognizing the Subwords of a Text , 1985, Theor. Comput. Sci..

[9]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[10]  Maxime Crochemore Reducing space for index implementation , 2003, Theor. Comput. Sci..

[11]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[12]  Roberto Grossi,et al.  Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching , 2005, SIAM J. Comput..

[13]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[14]  Esko Ukkonen,et al.  Approximate String-Matching over Suffix Trees , 1993, CPM.

[15]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[16]  Maxime Crochemore,et al.  Transducers and Repetitions , 1986, Theor. Comput. Sci..

[17]  Ayumi Shinohara,et al.  Construction of the CDAWG for a Trie , 2001, Stringology.

[18]  Juha Kärkkäinen Suffix Cactus: A Cross between Suffix Tree and Suffix Array , 1995, CPM.

[19]  Kunihiko Sadakane,et al.  Compressed Text Databases with Efficient Query Algorithms Based on the Compressed Suffix Array , 2000, ISAAC.

[20]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[21]  Stefan Kurtz,et al.  Reducing the space requirement of suffix trees , 1999, Softw. Pract. Exp..

[22]  Gaston H. Gonnet,et al.  New Indices for Text: Pat Trees and Pat Arrays , 1992, Information Retrieval: Data Structures & Algorithms.

[23]  Borivoj Melichar,et al.  Approximate string matching using factor automata , 2000, Theor. Comput. Sci..

[24]  J. Seiferas,et al.  Efficient and Elegant Subword-Tree Construction , 1985 .

[25]  Stefano Lonardi,et al.  A speed-up for the commute between subword trees and DAWGs , 2002, Inf. Process. Lett..

[26]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[27]  Maxime Crochemore,et al.  On Compact Directed Acyclic Word Graphs , 1997, Structures in Logic and Computer Science.

[28]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[29]  Ayumi Shinohara,et al.  On-line construction of symmetric compact directed acyclic word graphs , 2001, Proceedings Eighth Symposium on String Processing and Information Retrieval.

[30]  Wojciech Rytter,et al.  Jewels of stringology , 2002 .

[31]  Derick Wood,et al.  Approximate string matching with suffix automata , 2005, Algorithmica.

[32]  Miroslav Balík Implementation of DAWG , 1998, Stringology.

[33]  Roberto Grossi,et al.  Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract) , 2000, STOC '00.

[34]  Masayuki Takeda,et al.  Discovering Characteristic Expressions from Literary Works: A New Text Analysis Method beyond N-Gram Statistics and KWIC , 2000, Discovery Science.

[35]  M. Crochemore,et al.  On-line construction of suffix trees , 2002 .

[36]  Dan Gusfield,et al.  Algorithms on strings , 1997 .

[37]  Alberto Apostolico,et al.  The Myriad Virtues of Subword Trees , 1985 .

[38]  Maxime Crochemore,et al.  Direct Construction of Compact Directed Acyclic Word Graphs , 1997, CPM.

[39]  David Haussler,et al.  Complete inverted files for efficient text retrieval and analysis , 1987, JACM.

[40]  S. Rao Kosaraju,et al.  Efficient tree pattern matching , 1989, 30th Annual Symposium on Foundations of Computer Science.