Optimal prefix and suffix queries on texts

In this paper, we study a restricted version of the position restricted pattern matching problem introduced and studied by Makinen and Navarro [V. Makinen, G. Navarro, Position-restricted substring searching, in: J.R. Correa, A. Hevia, M.A. Kiwi (Eds.), LATIN, in: Lecture Notes in Computer Science, vol. 3887, Springer, 2006, pp. 703-714]. In the problem handled in this paper, we are interested in those occurrences of the pattern that lies in a suffix or in a prefix of the given text. We achieve optimal query time for our problem against a data structure which is an extension of the classic suffix tree data structure. The time and space complexity of the data structure is dominated by that of the suffix tree. Notably, the (best) algorithm by Makinen and Navarro, if applied to our problem, gives sub-optimal query time and the corresponding data structure also requires more time and space.

[1]  Amihood Amir,et al.  Faster Two Dimensional Scaled Matching , 2008, Algorithmica.

[2]  Richard Cole,et al.  Verifying candidate matches in sparse and wildcard matching , 2002, STOC '02.

[3]  Amihood Amir,et al.  Faster two-dimensional pattern matching with rotations , 2006, Theor. Comput. Sci..

[4]  Esko Ukkonen A linear-time algorithm for finding approximate shortest common superstrings , 2005, Algorithmica.

[5]  Roberto Grossi,et al.  Fast incremental text editing , 1995, SODA '95.

[6]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[7]  Gad M. Landau,et al.  An Efficient Algorithm for the All Pairs Suffix-Prefix Problem , 1992, Inf. Process. Lett..

[8]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[9]  Costas S. Iliopoulos,et al.  Indexing Circular Patterns , 2008, WALCOM.

[10]  Costas S. Iliopoulos,et al.  Indexing Factors with Gaps , 2007, Algorithmica.

[11]  Gonzalo Navarro,et al.  Compressed full-text indexes , 2007, CSUR.

[12]  Martin Farach-Colton,et al.  Optimal Suffix Tree Construction with Large Alphabets , 1997, FOCS.

[13]  Gonzalo Navarro,et al.  Dynamic Entropy-Compressed Sequences and Full-Text Indexes , 2006, CPM.

[14]  Costas S. Iliopoulos,et al.  Faster index for property matching , 2008, Inf. Process. Lett..

[15]  Gad M. Landau,et al.  Text Indexing and Dictionary Matching with One Error , 2000, J. Algorithms.

[16]  Maxime Crochemore,et al.  Improved Algorithms for the Range Next Value Problem and Applications , 2008, STACS.

[17]  Alberto Apostolico,et al.  The Myriad Virtues of Subword Trees , 1985 .

[18]  Gonzalo Navarro,et al.  Optimal Exact and Fast Approximate Two Dimensional Pattern Matching Allowing Rotations , 2002, CPM.

[19]  Gad M. Landau,et al.  Two-dimensional pattern matching with rotations , 2004, Theor. Comput. Sci..

[20]  Moshe Lewenstein,et al.  Combinatorial Pattern Matching: 17th Annual Symposium, CPM 2006, Barcelona, Spain, July 5-7, 2006Proceedings (Lecture Notes in Computer Science) , 2006 .

[21]  Michael A. Bender,et al.  The LCA Problem Revisited , 2000, LATIN.

[22]  M. Crochemore,et al.  On-line construction of suffix trees , 2002 .

[23]  Gad M. Landau,et al.  Parallel Suffix-Prefix-Matching Algorithm and Applications , 1996, SIAM J. Comput..

[24]  M. Fischer,et al.  STRING-MATCHING AND OTHER PRODUCTS , 1974 .

[25]  Kunihiko Sadakane,et al.  Succinct data structures for flexible text retrieval systems , 2007, J. Discrete Algorithms.

[26]  Johannes Nowak,et al.  Text indexing with errors , 2007, J. Discrete Algorithms.

[27]  Ron Y. Pinter,et al.  Efficient String Matching with Don’t-Care Patterns , 1985 .

[28]  Arthur M. Lesk Computational Molecular Biology: Sources and Methods for Sequence Analysis , 1989 .

[29]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[30]  Richard Cole,et al.  Dictionary matching and indexing with errors and don't cares , 2004, STOC '04.

[31]  A. Lesk COMPUTATIONAL MOLECULAR BIOLOGY , 1988, Proceeding of Data For Discovery.

[32]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[33]  Ugur Dogrusoz,et al.  Combinatorial Pattern Matching: 15th Annual Symposium, CPM 2004, Istanbul, Turkey, July 5-7, 2004, Proceedings (Lecture Notes in Computer Science) , 2004 .

[34]  Ming Gu,et al.  An efficient algorithm for dynamic text indexing , 1994, SODA '94.

[35]  Costas S. Iliopoulos,et al.  Indexing Factors in DNA/RNA Sequences , 2008, BIRD.

[36]  Gad M. Landau,et al.  Scaled and permuted string matching , 2004, Inf. Process. Lett..

[37]  Erkki Sutinen,et al.  Indexing text with approximate q-grams , 2000, J. Discrete Algorithms.

[38]  Amihood Amir,et al.  Faster Two Dimensional Pattern Matching with Rotations , 2004, CPM.

[39]  Moshe Lewenstein,et al.  Real scaled matching , 2000, SODA '00.

[40]  Tao Jiang,et al.  Linear approximation of shortest superstrings , 1991, STOC '91.

[41]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[42]  Eugene W. Myers,et al.  Combinatorial algorithms for DNA sequence assembly , 1995, Algorithmica.

[43]  Esko Ukkonen,et al.  A Greedy Approximation Algorithm for Constructing Shortest Common Superstrings , 1988, Theor. Comput. Sci..

[44]  Costas S. Iliopoulos,et al.  Finding Patterns with Variable Length Gaps or Don't Cares , 2006, COCOON.

[45]  Gonzalo Navarro,et al.  Dynamic entropy-compressed sequences and full-text indexes , 2006, TALG.

[46]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[47]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[48]  Gad M. Landau,et al.  Efficient pattern matching with scaling , 1990, SODA '90.

[49]  S. Muthukrishnan,et al.  Efficient algorithms for document retrieval problems , 2002, SODA '02.

[50]  Gonzalo Navarro,et al.  Position-Restricted Substring Searching , 2006, LATIN.

[51]  Volker Heun,et al.  A New Succinct Representation of RMQ-Information and Improvements in the Enhanced Suffix Array , 2007, ESCAPE.

[52]  Amihood Amir,et al.  Faster Two Dimensional Scaled Matching , 2006, CPM.

[53]  Wojciech Rytter,et al.  Jewels of stringology , 2002 .

[54]  Wing-Kai Hon,et al.  Compressed Index for a Dynamic Collection of Texts , 2004, CPM.

[55]  Maxime Crochemore,et al.  Finding Patterns In Given Intervals , 2007, Fundam. Informaticae.

[56]  Robert E. Tarjan,et al.  Scaling and related techniques for geometry problems , 1984, STOC '84.