On the Complexity of String Matching for Graphs

Exact string matching in labeled graphs is the problem of searching paths of a graph G = (V,E) such that the concatenation of their node labels is equal to the given pattern string P [1..m]. This basic problem can be found at the heart of more complex operations on variation graphs in computational biology, of query operations in graph databases, and of analysis operations in heterogeneous networks. We prove a conditional lower bound stating that, for any constant > 0, an O(|E|1− m)-time, or an O(|E|m1− )-time algorithm for exact string matching in graphs, with node labels and patterns drawn from a binary alphabet, cannot be achieved unless the Strong Exponential Time Hypothesis (SETH) is false. This holds even if restricted to undirected graphs with maximum node degree two, i.e. to zig-zag matching in bidirectional strings, or to deterministic directed acyclic graphs whose nodes have maximum sum of indegree and outdegree three. These restricted cases make the lower bound stricter than what can be directly derived from related bounds on regular expression matching (Backurs and Indyk, FOCS’16). In fact, our bounds are tight in the sense that lowering the degree or the alphabet size yields linear-time solvable problems. An interesting corollary is that exact and approximate matching are equally hard (quadratic time) in graphs under SETH. In comparison, the same problems restricted to strings have linear-time vs quadratic-time solutions, respectively (approximate pattern matching having also a matching SETH lower bound (Backurs and Indyk, STOC’15)). 2012 ACM Subject Classification Theory of computation → Pattern matching

[1]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[2]  Tobias Marschall,et al.  Aligning sequences to general graphs in O(V + mE) time , 2017, bioRxiv.

[3]  Gonzalo Navarro Improved approximate pattern matching on hypertext , 2000, Theor. Comput. Sci..

[4]  Ryan Williams,et al.  A new algorithm for optimal 2-constraint satisfaction and its implications , 2005, Theor. Comput. Sci..

[5]  Marko A. Rodriguez,et al.  The Gremlin graph traversal machine and language (invited talk) , 2015, DBPL.

[6]  Dana S. Scott,et al.  Finite Automata and Their Decision Problems , 1959, IBM J. Res. Dev..

[7]  Roberto Grossi,et al.  On the Complexity of Exact Pattern Matching in Graphs: Binary Strings and Bounded Degree , 2019, ArXiv.

[8]  Mateus de Oliveira Oliveira,et al.  Intersection Non-emptiness and Hardness Within Polynomial Time , 2018, DLT.

[9]  Veli Mäkinen,et al.  Indexing Graphs for Path Queries with Applications in Genome Research , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  Michael Wehar,et al.  Hardness Results for Intersection Non-Emptiness , 2014, ICALP.

[11]  Travis Gagie,et al.  Wheeler graphs: A framework for BWT-based data structures☆ , 2017, Theor. Comput. Sci..

[12]  Hisashi Kashima,et al.  A Linear-Time Graph Kernel , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[13]  Sharma V. Thankachan,et al.  On the Hardness and Inapproximability of Recognizing Wheeler Graphs , 2019, ESA.

[14]  Udi Manber,et al.  APPROXIMATE STRING MATCHING WITH ARBITRARY COSTS FOR TEXT AND HYPERTEXT , 1993 .

[15]  Stefan Plantikow,et al.  Cypher: An Evolving Query Language for Property Graphs , 2018, SIGMOD Conference.

[16]  Roberto Grossi,et al.  On the Complexity of Exact Pattern Matching in Graphs: Determinism and Zig-Zag Matching , 2019, ArXiv.

[17]  Chirag Jain,et al.  On the Complexity of Sequence to Graph Alignment , 2019, bioRxiv.

[18]  Tatsuya Akutsu A Linear Time Pattern Matching Algorithm Between a String and a Tree , 1993, CPM.

[19]  Moshe Lewenstein,et al.  Pattern Matching in Hypertext , 1997, J. Algorithms.

[20]  Chris Thachuk Indexing hypertext , 2013, J. Discrete Algorithms.

[21]  J. van Leeuwen,et al.  Combinatorial Pattern Matching , 2002, Lecture Notes in Computer Science.

[22]  Dong Kyue Kim,et al.  String Matching in Hypertext , 1995, CPM.

[23]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[24]  Russell Impagliazzo,et al.  On the Complexity of k-SAT , 2001, J. Comput. Syst. Sci..