If the Current Clique Algorithms are Optimal, So is Valiant's Parser

The CFG recognition problem is: given a context-free grammar G and a string w of length n, decide if w can be obtained from G. This is the most basic parsing question and is a core computer science problem. Valiant's parser from 1975 solves the problem in O(nO) time, where ? <; 2:373 is the matrix multiplication exponent. Dozens of parsing algorithms have been proposed over the years, yet Valiant's upper bound remains unbeaten. The best combinatorial algorithms have mildly subcubic O(n3= log3 n) complexity. Lee (JACM'01) provided evidence that fast matrix multiplication is needed for CFG parsing, and that very efficient and practical algorithms might be hard or even impossible to obtain. Lee showed that any algorithm for a more general parsing problem with running time O(|G| n3 -- e) can be converted into a surprising subcubic algorithm for Boolean Matrix Multiplication. Unfortunately, Lee' s hardness result required that the grammar size be |G| = O(n6). Nothing was known for the more relevant case of constant size grammars. In this work, we prove that any improvement on Valiant' s algorithm, even for constant size grammars, either in terms of runtime or by avoiding the inefficiencies of fast matrix multiplication, would imply a breakthrough algorithm for the k-Clique problem: given a graph on n nodes, decide if there are k that form a clique. Besides classifying the complexity of a fundamental problem, our reduction has led us to similar lower bounds for more modern and well-studied cubic time problems for which faster algorithms are highly desirable in practice: RNA Folding, a central problem in computational biology, and Dyck Language Edit Distance, answering an open question of Saha (FOCS'14).

[1]  Lillian Lee,et al.  Fast context-free grammar parsing requires fast boolean matrix multiplication , 2001, JACM.

[2]  Dan Gusfield,et al.  Faster algorithms for RNA-folding using the Four-Russians method , 2013, Algorithms for Molecular Biology.

[3]  Timothy M. Chan Speeding up the Four Russians Algorithm by About One More Logarithmic Factor , 2015, SODA.

[4]  Alfred V. Aho,et al.  A Minimum Distance Error-Correcting Parser for Context-Free Languages , 1972, SIAM J. Comput..

[5]  Gerhard J. Woeginger,et al.  Space and Time Complexity of Exact Algorithms : Some Open Problems , 2004 .

[6]  Gerhard J. Woeginger,et al.  Open problems around exact algorithms , 2008, Discret. Appl. Math..

[7]  Walter L. Ruzzo,et al.  On the Complexity of General Context-Free Language Parsing and Recognition (Extended Abstract) , 1979, ICALP.

[8]  Virginia Vassilevska Williams,et al.  Multiplying matrices faster than coppersmith-winograd , 2012, STOC '12.

[9]  Barna Saha,et al.  The Dyck Language Edit Distance Problem in Near-Linear Time , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[10]  Dan Gusfield,et al.  A Simple, Practical and Complete O(\fracn3 logn)O(\frac{n^3}{ \log n})-Time Algorithm for RNA Folding Using the Four-Russians Speedup , 2009, WABI.

[11]  Alexander M. Rush,et al.  On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing , 2010, EMNLP.

[12]  Michael Sipser,et al.  Introduction to the Theory of Computation , 1996, SIGA.

[13]  R. Gutell,et al.  A story: unpaired adenosine bases in ribosomal RNAs. , 2000, Journal of molecular biology.

[14]  Jeffrey D. Ullman,et al.  Introduction to automata theory, languages, and computation, 2nd edition , 2001, SIGA.

[15]  Yi-Jun Chang Conditional Lower Bound for RNA Folding Problem , 2015, ArXiv.

[16]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[17]  Amir Abboud,et al.  Tight Hardness Results for LCS and Other Sequence Similarity Measures , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[18]  Barna Saha,et al.  Language Edit Distance and Maximum Likelihood Parsing of Stochastic Grammars: Faster Algorithms and Connection to Fundamental Graph Problems , 2014, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[19]  Michal Ziv-Ukelson,et al.  Reducing the worst case running times of a family of RNA and CFG problems, using Valiant's approach , 2010, Algorithms for Molecular Biology.

[20]  Rolf Backofen,et al.  Sparse RNA Folding: Time and Space Efficient Algorithms , 2009, CPM.

[21]  Iyad A. Kanj,et al.  Tight lower bounds for certain parameterized NP-hard problems , 2004, Proceedings. 19th IEEE Annual Conference on Computational Complexity, 2004..

[22]  Dan Gusfield,et al.  A simple, practical and complete O(n³/log n)-time algorithm for RNA folding using the four-Russians speedup , 2009, WABI 2009.

[23]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[24]  Alexandr Andoni,et al.  Approximating edit distance in near-linear time , 2009, STOC '09.

[25]  Michal Ziv-Ukelson,et al.  Edit Distance with Duplications and Contractions Revisited , 2011, CPM.

[26]  Dan Klein,et al.  K-Best A* Parsing , 2009, ACL.

[27]  Giorgio Satta,et al.  Tree-Adjoining Grammar Parsing and Boolean Matrix Multiplication , 1994, Comput. Linguistics.

[28]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[29]  Ge Xia,et al.  Strong computational lower bounds via parameterized complexity , 2006, J. Comput. Syst. Sci..

[30]  Joan-Andreu Sánchez,et al.  Fast Stochastic Context-Free Parsing: A Stochastic Version of the Valiant Algorithm , 2007, IbPRIA.

[31]  Alexandr Andoni,et al.  Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[32]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[33]  Eugene W. Myers,et al.  Approximately Matching Context-Free Languages , 1995, Inf. Process. Lett..

[34]  Moshe Lewenstein,et al.  Clustered Integer 3SUM via Additive Combinatorics , 2015, STOC.

[35]  Charles N. Fischer,et al.  On the Role of Error Productions in Syntactic Error Correction , 1980, Comput. Lang..

[36]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[37]  J. Håstad Clique is hard to approximate withinn1−ε , 1999 .

[38]  F. L. Deremer,et al.  Practical translators for LR(k) languages , 1969 .

[39]  Leslie G. Valiant,et al.  General Context-Free Recognition in Less than Cubic Time , 1975, J. Comput. Syst. Sci..

[40]  Mark Jerrum,et al.  Large Cliques Elude the Metropolis Process , 1992, Random Struct. Algorithms.

[41]  Divesh Srivastava,et al.  On Repairing Structural Problems In Semi-structured Data , 2013, Proc. VLDB Endow..

[42]  Joel I. Seiferas,et al.  A Simplified Lower Bound for Context-Free-Language Recognition , 1986, Inf. Control..

[43]  Noam Chomsky,et al.  On Certain Formal Properties of Grammars , 1959, Inf. Control..

[44]  Wojciech Rytter,et al.  Context-Free Recognition via Shortest Paths Computation: A Version of Valiant's Algorithm , 1995, Theor. Comput. Sci..

[45]  J. Baker Trainable grammars for speech recognition , 1979 .

[46]  Piotr Indyk,et al.  Edit Distance Cannot Be Computed in Strongly Subquadratic Time (unless SETH is false) , 2014, STOC.

[47]  Noga Alon,et al.  Finding a large hidden clique in a random graph , 1998, SODA '98.

[48]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[49]  Hervé Gallaire,et al.  Recognition Time of Context-Free Languages by On-Line Turing Machines , 1969, Inf. Control..

[50]  Sanguthevar Rajasekaran,et al.  An Error Correcting Parser for Context Free Grammars that Takes Less Than Cubic Time , 2014, LATA.

[51]  Tadao Kasami,et al.  An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[52]  Barna Saha,et al.  Faster Language Edit Distance, Connection to All-pairs Shortest Paths and Related Problems , 2014, ArXiv.

[53]  Wojciech Rytter,et al.  Fast Recognition of Pushdown Automaton and Context-free Languages , 1986, Inf. Control..

[54]  François Le Gall,et al.  Powers of tensors and fast matrix multiplication , 2014, ISSAC.

[55]  Dan Gusfield,et al.  Faster Algorithms for RNA-Folding Using the Four-Russians Method , 2013, WABI.

[56]  Ivan M. Havel,et al.  On the Parsing of Deterministic Languages , 1974, JACM.

[57]  Noga Alon,et al.  Testing k-wise and almost k-wise independence , 2007, STOC '07.

[58]  Ryan Williams,et al.  A new algorithm for optimal 2-constraint satisfaction and its implications , 2005, Theor. Comput. Sci..

[59]  Tatsuya Akutsu,et al.  Approximation and Exact Algorithms for RNA Secondary Structure Prediction and Recognition of Stochastic Context-free Languages , 1998, J. Comb. Optim..

[60]  Ge Xia,et al.  Tight lower bounds for certain parameterized NP-hard problems , 2004, Proceedings. 19th IEEE Annual Conference on Computational Complexity, 2004..

[61]  Ryan Williams,et al.  Losing Weight by Gaining Edges , 2013, ESA.

[62]  Friedrich Eisenbrand,et al.  On the complexity of fixed parameter clique and dominating set , 2004, Theor. Comput. Sci..

[63]  V. Strassen Gaussian elimination is not optimal , 1969 .

[64]  Ryan Williams,et al.  Faster all-pairs shortest paths via circuit complexity , 2013, STOC.

[65]  Robert Krauthgamer,et al.  How hard is it to approximate the best Nash equilibrium? , 2009, SODA.

[66]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[67]  J. Hartmanis,et al.  On the Computational Complexity of Algorithms , 1965 .

[68]  Svatopluk Poljak,et al.  On the complexity of the subgraph problem , 1985 .

[69]  Noga Alon,et al.  The monotone circuit complexity of boolean functions , 1987, Comb..

[70]  Huacheng Yu,et al.  An Improved Combinatorial Algorithm for Boolean Matrix Multiplication , 2015, ICALP.

[71]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[72]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[73]  Virginia Vassilevska Williams,et al.  Efficient algorithms for clique problems , 2009, Inf. Process. Lett..

[74]  Marvin Künnemann,et al.  Quadratic Conditional Lower Bounds for String Problems and Dynamic Time Warping , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[75]  Giorgio Satta,et al.  Approximate PCFG Parsing Using Tensor Decomposition , 2013, NAACL.

[76]  Donald E. Knuth,et al.  On the Translation of Languages from Left to Right , 1965, Inf. Control..

[77]  Yinglei Song,et al.  Time and Space Efficient Algorithms for RNA Folding with the Four-Russians Technique , 2015, ArXiv.

[78]  Amir Abboud,et al.  Quadratic-Time Hardness of LCS and other Sequence Similarity Measures , 2015, ArXiv.

[79]  Walter L. Ruzzo,et al.  An Improved Context-Free Recognizer , 1980, ACM Trans. Program. Lang. Syst..