Internal Dictionary Matching

We introduce data structures answering queries concerning the occurrences of patterns from a given dictionary $\mathcal{D}$ in fragments of a given string $T$ of length $n$. The dictionary is internal in the sense that each pattern in $\mathcal{D}$ is given as a fragment of $T$. This way, $\mathcal{D}$ takes space proportional to the number of patterns $d=|\mathcal{D}|$ rather than their total length, which could be $\Theta(n\cdot d)$. In particular, we consider the following types of queries: reporting and counting all occurrences of patterns from $\mathcal{D}$ in a fragment $T[i..j]$ and reporting distinct patterns from $\mathcal{D}$ that occur in $T[i..j]$. We show how to construct, in $\mathcal{O}((n+d) \log^{\mathcal{O}(1)} n)$ time, a data structure that answers each of these queries in time $\mathcal{O}(\log^{\mathcal{O}(1)} n+|output|)$. The case of counting patterns is much more involved and needs a combination of a locally consistent parsing with orthogonal range searching. Reporting distinct patterns, on the other hand, uses the structure of maximal repetitions in strings. Finally, we provide tight---up to subpolynomial factors---upper and lower bounds for the case of a dynamic dictionary.

[1]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[2]  Hideo Bannai,et al.  Computing All Distinct Squares in Linear Time for Integer Alphabets , 2017, CPM.

[3]  S. Muthukrishnan,et al.  Efficient algorithms for document retrieval problems , 2002, SODA '02.

[4]  Wojciech Rytter,et al.  A Linear-Time Algorithm for Seeds Computation , 2011, SODA.

[5]  J. Ian Munro,et al.  Space efficient data structures for dynamic orthogonal range counting , 2014, Comput. Geom..

[6]  Gonzalo Navarro,et al.  Position-Restricted Substring Searching , 2006, LATIN.

[7]  Volker Heun,et al.  Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays , 2011, SIAM J. Comput..

[8]  Wing-Kai Hon,et al.  Dynamic dictionary matching and compressed suffix trees , 2005, SODA '05.

[9]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[10]  Gad M. Landau,et al.  Dynamic text and static pattern matching , 2007, TALG.

[11]  Moshe Lewenstein,et al.  Persistency in Suffix Trees with Applications to String Interval Problems , 2011, SPIRE.

[12]  Jeffrey Scott Vitter,et al.  Fast Construction of Wavelet Trees , 2014, SPIRE.

[13]  Arseny M. Shur,et al.  Counting Palindromes in Substrings , 2017, SPIRE.

[14]  Kazuya Tsuruta,et al.  The "Runs" Theorem , 2014, SIAM J. Comput..

[15]  Artur Jez,et al.  Recompression: a simple and powerful technique for word equations , 2012, STACS.

[16]  Richard Cole,et al.  Dictionary matching and indexing with errors and don't cares , 2004, STOC '04.

[17]  Gad M. Landau,et al.  The nearest colored node in a tree , 2018, Theor. Comput. Sci..

[18]  Guy Jacobson,et al.  Space-efficient static trees and graphs , 1989, 30th Annual Symposium on Foundations of Computer Science.

[19]  Moshe Lewenstein,et al.  Generalized substring compression , 2009, Theor. Comput. Sci..

[20]  Gregory Kucherov,et al.  Finding maximal repetitions in a word in linear time , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[21]  Wojciech Rytter,et al.  Extracting powers and periods in a word from its runs structure , 2014, Theor. Comput. Sci..

[22]  Alejandro A. Schäffer,et al.  Improved dynamic dictionary matching , 1995, SODA '93.

[23]  Monika Henzinger,et al.  Unifying and Strengthening Hardness for Dynamic Problems via the Online Matrix-Vector Multiplication Conjecture , 2015, STOC.

[24]  Artur Jez,et al.  Faster Fully Compressed Pattern Matching by Recompression , 2011, ICALP.

[25]  Michael L. Fredman And e.szemer~di.storing a sparse table with o(1) worst case access time , 1982, FOCS 1982.

[26]  Wojciech Rytter,et al.  Counting Distinct Patterns in Internal Dictionary Matching , 2020, CPM.

[27]  Maxim A. Babenko,et al.  Wavelet Trees Meet Suffix Trees , 2015, SODA.

[28]  Robert E. Tarjan,et al.  A linear-time algorithm for a special case of disjoint set union , 1983, J. Comput. Syst. Sci..

[29]  Mihai Pa caron,et al.  Unifying the Landscape of Cell-Probe Lower Bounds , 2011 .

[30]  Gwénaël Richomme,et al.  Counting distinct palindromes in a word in linear time , 2010, Inf. Process. Lett..

[31]  Tomasz Kociumaka Efficient data structures for internal queries in texts , 2019 .

[32]  S. Muthukrishnan,et al.  On the sorting-complexity of suffix tree construction , 2000, JACM.

[33]  Raffaele Giancarlo,et al.  Dynamic Dictionary Matching , 1994, J. Comput. Syst. Sci..

[34]  János Komlós,et al.  Storing a sparse table with O(1) worst case access time , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[35]  Steven Skiena,et al.  Lowest common ancestors in trees and directed acyclic graphs , 2005, J. Algorithms.

[36]  Wojciech Rytter,et al.  Internal Pattern Matching Queries in a Text and Applications , 2013, SODA.

[37]  Robert E. Tarjan,et al.  Fast Algorithms for Finding Nearest Common Ancestors , 1984, SIAM J. Comput..

[38]  Michael A. Bender,et al.  The Level Ancestor Problem Simplified , 2002, LATIN.

[39]  I Tomohiro,et al.  Longest Common Extensions with Recompression , 2016, CPM.

[40]  Gonzalo Navarro,et al.  Rank and select revisited and extended , 2007, Theor. Comput. Sci..

[41]  David Richard Clark,et al.  Compact pat trees , 1998 .