Reducing space for index implementation

This article considers several strategies to implement efficiently full indexes on raw textual data. Indexes are based on representations of all the suffixes of the original text, for which we describe three types of implementations aimed at reducing the memory space. The first method is a combination of compaction and minimization that leads to the compact suffix automaton. As a second method we show that considering a complement language can be useful especially when it is related to data compression. Finally, approximation of the set of suffixes is the third technique used to reduce the space of the implementation.

[1]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[2]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[3]  S. Srinivasa Rao,et al.  Space Efficient Suffix Trees , 1998, FSTTCS.

[4]  Juha Kärkkäinen,et al.  Sparse Suffix Trees , 1996, COCOON.

[5]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[6]  Veli Mäkinen,et al.  Compact Suffix Array , 2000, CPM.

[7]  David Haussler,et al.  The Smallest Automaton Recognizing the Subwords of a Text , 1985, Theor. Comput. Sci..

[8]  Maxime Crochemore,et al.  Transducers and Repetitions , 1986, Theor. Comput. Sci..

[9]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[10]  Ayumi Shinohara,et al.  Pattern Matching in Text Compressed by Using Antidictionaries , 1999, CPM.

[11]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[12]  Maxime Crochemore,et al.  On Compact Directed Acyclic Word Graphs , 1997, Structures in Logic and Computer Science.

[13]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[14]  David Haussler,et al.  Average sizes of suffix trees and DAWGs , 1989, Discret. Appl. Math..

[15]  Roberto Grossi,et al.  A fully-dynamic data structure for external substring search , 1995, STOC '95.

[16]  Maxime Crochemore,et al.  Factor Oracle: A New Structure for Pattern Matching , 1999, SOFSEM.

[17]  Jean Berstel,et al.  Transductions and context-free languages , 1979, Teubner Studienbücher : Informatik.

[18]  Antonio Restivo,et al.  Automata and Forbidden Words , 1998, Inf. Process. Lett..

[19]  Mathieu Raffinot Asymptotic Estimation of the Average Number of Terminal States in DAWGs , 1999, Discret. Appl. Math..

[20]  Juha Kärkkäinen Suffix Cactus: A Cross between Suffix Tree and Suffix Array , 1995, CPM.

[21]  Arne Andersson,et al.  Efficient implementation of suffix trees , 1995, Softw. Pract. Exp..

[22]  V AhoAlfred,et al.  Efficient string matching , 1975 .

[23]  A. Restivo,et al.  Data compression using antidictionaries , 2000, Proceedings of the IEEE.

[24]  Stefan Kurtz,et al.  Reducing the space requirement of suffix trees , 1999 .

[25]  Gerald Salton,et al.  Automatic text processing , 1988 .

[26]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[27]  Antonio Restivo,et al.  Minimal Forbidden Words and Symbolic Dynamics , 1996, STACS.