Grammar Factorization by Tree Decomposition

We describe the application of the graph-theoretic property known as treewidth to the problem of finding efficient parsing algorithms. This method, similar to the junction tree algorithm used in graphical models for machine learning, allows automatic discovery of efficient algorithms such as the O(n4) algorithm for bilexical grammars of Eisner and Satta. We examine the complexity of applying this method to parsing algorithms for general Linear Context-Free Rewriting Systems. We show that any polynomial-time algorithm for this problem would imply an improved approximation algorithm for the well-studied treewidth problem on general graphs.

[1]  Daniel Gildea,et al.  Binarization of Synchronous Context-Free Grammars , 2009, CL.

[2]  Giorgio Satta,et al.  Efficient Parsing for Bilexical Context-Free Grammars and Head Automaton Grammars , 1999, ACL.

[3]  James R. Lee,et al.  Improved approximation algorithms for minimum-weight vertex separators , 2005, STOC '05.

[4]  Joshua Goodman,et al.  Semiring Parsing , 1999, CL.

[5]  Giorgio Satta,et al.  Independent Parallelism in Finite Copying Parallel Rewriting Systems , 1999, Theor. Comput. Sci..

[6]  Steffen L. Lauritzen,et al.  Bayesian updating in causal probabilistic networks by local computations , 1990 .

[7]  Stuart M. Shieber,et al.  Principles and Implementation of Deductive Parsing , 1994, J. Log. Program..

[8]  J. B. Program transformations for optimization of parsing algorithms and other weighted logic programs , 2007 .

[9]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[10]  Hans L. Bodlaender,et al.  A linear time algorithm for finding tree-decompositions of small treewidth , 1993, STOC.

[11]  David J. Weir,et al.  Characterizing Structural Descriptions Produced by Various Grammatical Formalisms , 1987, ACL.

[12]  Giorgio Satta,et al.  Recognition of Linear Context-Free Rewriting Systems , 1992, ACL.

[13]  David A. McAllester On the complexity analysis of static analyses , 1999, JACM.

[14]  Eyal Amir,et al.  Efficient Approximation for Triangulation of Minimum Treewidth , 2001, UAI.

[15]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[16]  Giorgio Satta,et al.  Optimal Reduction of Rule Length in Linear Context-Free Rewriting Systems , 2009, NAACL.

[17]  Jörg Flum,et al.  Query evaluation via tree-decompositions , 2001, JACM.

[18]  Daniel Gildea,et al.  Worst-Case Synchronous Grammar Rules , 2007, HLT-NAACL.

[19]  Giorgio Satta,et al.  Treebank Grammar Techniques for Non-Projective Dependency Parsing , 2009, EACL.

[20]  Rina Dechter,et al.  Tree Clustering for Constraint Networks , 1989, Artif. Intell..

[21]  Daniel Gildea,et al.  Factorization of Synchronous Context-Free Grammars in Linear Time , 2007, SSST@HLT-NAACL.

[22]  Mark-Jan Nederhof,et al.  Squibs and Discussions: Weighted Deductive Parsing and Knuth’s Algorithm , 2003, CL.

[23]  I. Dan Melamed,et al.  Empirical Lower Bounds on the Complexity of Translational Equivalence , 2006, ACL.

[24]  Joakim Nivre,et al.  Mildly Non-Projective Dependency Structures , 2006, ACL.

[25]  John R. Gilbert,et al.  Approximating Treewidth, Pathwidth, Frontsize, and Shortest Elimination Tree , 1995, J. Algorithms.

[26]  Prakash P. Shenoy,et al.  Axioms for probability and belief-function proagation , 1990, UAI.

[27]  Giorgio Satta,et al.  A faster parsing algorithm for Lexicalized Tree-Adjoining Grammars , 2000, TAG+.

[28]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[29]  Prakash P. Shenoy,et al.  Probability propagation , 1990, Annals of Mathematics and Artificial Intelligence.

[30]  Donald E. Knuth,et al.  A Generalization of Dijkstra's Algorithm , 1977, Inf. Process. Lett..

[31]  Mark Johnson,et al.  Transforming Projective Bilexical Dependency Grammars into efficiently-parsable CFGs with Unfold-Fold , 2007, ACL.

[32]  Anand Rajaraman,et al.  Conjunctive query containment revisited , 1997, Theor. Comput. Sci..

[33]  Jerome David Sable,et al.  LANGUAGE AND INFORMATION STRUCTURE IN INFORMATION SYSTEMS , 1963 .

[34]  Alfred V. Aho,et al.  The Theory of Parsing, Translation, and Compiling , 1972 .

[35]  Derek G. Corneil,et al.  Complexity of finding embeddings in a k -tree , 1987 .

[36]  Yehoshua Bar-Hillel,et al.  Language and information : selected essays on their theory and application , 1965 .

[37]  Giorgio Satta,et al.  Generalized Multitext Grammars , 2004, ACL.

[38]  Daniel Gildea,et al.  Optimal Parsing Strategies for Linear Context-Free Rewriting Systems , 2010, NAACL.

[39]  Daniel Gildea,et al.  Machine Translation as Lexicalized Parsing with Hooks , 2005, IWPT.

[40]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .