On hardness of several string indexing problems

Let D = { d 1 , d 2 , ? , d D } be a collection of D string documents of n characters in total. The two-pattern matching problems ask to index D for answering the following queries efficiently.Report/count the unique documents containing P 1 and P 2 .Report/count the unique documents containing P 1 , but not P 2 .Here P 1 and P 2 represent input patterns of length p 1 and p 2 respectively. Linear space data structures with O ( p 1 + p 2 + n k log O ( 1 ) ? n ) query cost are already known for the reporting version, where k represents the output size. For the counting version (i.e., report the value k), a simple linear-space index with O ( p 1 + p 2 + n ) query cost can be constructed in O ( n 3 / 2 ) time. However, it is still not known if these are the best possible bounds for these problems. In this paper, we show a strong connection between these string indexing problems and the boolean matrix multiplication problem. Based on this, we argue that these results cannot be improved significantly using purely combinatorial techniques. We also provide an improved upper bound for a related problem known as common colors query problem.

[1]  N. Koudas,et al.  Two-dimensional substring indexing , 2001, PODS '01.

[2]  Joseph JáJá,et al.  Space-Efficient and Fast Algorithms for Multidimensional Dominance Reporting and Counting , 2004, ISAAC.

[3]  S. Srinivasa Rao,et al.  On Space Efficient Two Dimensional Range Minimum Data Structures , 2011, Algorithmica.

[4]  Mihai Patrascu,et al.  On dynamic range reporting in one dimension , 2005, STOC '05.

[5]  Wing-Kai Hon,et al.  String Retrieval for Multi-pattern Queries , 2010, SPIRE.

[6]  Timothy M. Chan,et al.  Linear-Space Data Structures for Range Mode Query in Arrays , 2011, Theory of Computing Systems.

[7]  Timothy M. Chan,et al.  Linear-Space Data Structures for Range Mode Query in Arrays , 2012, STACS.

[8]  François Le Gall,et al.  Powers of tensors and fast matrix multiplication , 2014, ISSAC.

[9]  Yossi Matias,et al.  Augmenting Suffix Trees, with Applications , 1998, ESA.

[10]  Timothy M. Chan,et al.  Linear-Space Data Structures for Range Minority Query in Arrays , 2014, Algorithmica.

[11]  Stephen Alstrup,et al.  Optimal static range reporting in one dimension , 2001, STOC '01.

[12]  Rajeev Raman,et al.  Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets , 2007, ACM Trans. Algorithms.

[13]  Guy Jacobson,et al.  Space-efficient static trees and graphs , 1989, 30th Annual Symposium on Foundations of Computer Science.

[14]  J. Ian Munro,et al.  On hardness of several string indexing problems , 2014, Theor. Comput. Sci..

[15]  S. Muthukrishnan,et al.  Efficient algorithms for document retrieval problems , 2002, SODA '02.

[16]  Gonzalo Navarro,et al.  Sorted Range Reporting , 2012, SWAT.

[17]  Wing-Kai Hon,et al.  Space-Efficient Frameworks for Top-k String Retrieval , 2014, J. ACM.

[18]  Wing-Kai Hon,et al.  Document Listing for Queries with Excluded Pattern , 2012, CPM.

[19]  Timothy M. Chan,et al.  Orthogonal range searching on the RAM, revisited , 2011, SoCG '11.

[20]  S. Srinivasa Rao,et al.  Rank/select operations on large alphabets: a tool for text indexing , 2006, SODA '06.

[21]  Ely Porat,et al.  Fast set intersection and two-patterns matching , 2010, Theor. Comput. Sci..

[22]  Gonzalo Navarro,et al.  Spaces, Trees, and Colors , 2013, ACM Comput. Surv..

[23]  Nikhil Bansal,et al.  Regularity Lemmas and Combinatorial Algorithms , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[24]  Wing-Kai Hon,et al.  Space-Efficient Framework for Top-k String Retrieval Problems , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[25]  Moshe Lewenstein,et al.  Forbidden Patterns , 2012, LATIN.