A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing

Via an oracle experiment, we show that the upper bound on accuracy of a CCG parser is significantly lowered when its search space is pruned using a supertagger, though the supertagger also prunes many bad parses. Inspired by this analysis, we design a single model with both supertagging and parsing features, rather than separating them into distinct models chained together in a pipeline. To overcome the resulting increase in complexity, we experiment with both belief propagation and dual decomposition approaches to inference, the first empirical comparison of these algorithms that we are aware of on a structured natural language processing problem. On CCGbank we achieve a labelled dependency F-measure of 88.8% on gold POS tags, and 86.7% on automatic part-of-speeoch tags, the best reported results for this task.

[1]  George B. Dantzig,et al.  Decomposition Principle for Linear Programs , 1960 .

[2]  Martin Kay,et al.  Syntactic Process , 1979, ACL.

[3]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[4]  Michael I. Jordan,et al.  Probabilistic Independence Networks for Hidden Markov Probability Models , 1997, Neural Computation.

[5]  Srinivas Bangalore,et al.  Supertagging: An Approach to Almost Parsing , 1999, CL.

[6]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[7]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[8]  Mark Steedman,et al.  Generative Models for Statistical Parsing with Combinatory Categorial Grammar , 2002, ACL.

[9]  Stephen Clark,et al.  Supertagging for Combinatory Categorial Grammar , 2002, TAG+.

[10]  Léon Bottou,et al.  Stochastic Learning , 2003, Advanced Lectures on Machine Learning.

[11]  James R. Curran,et al.  The Importance of Supertagging for Wide-Coverage CCG Parsing , 2004, COLING.

[12]  Fernando Pereira,et al.  Case-factor diagrams for structured probabilistic modeling , 2004, J. Comput. Syst. Sci..

[13]  James R. Curran,et al.  Parsing the WSJ Using CCG and Log-Linear Models , 2004, ACL.

[14]  Andrew McCallum,et al.  Joint Parsing and Semantic Role Labeling , 2005, CoNLL.

[15]  Andrew Y. Ng,et al.  Solving the Problem of Cascading Errors: Approximate Bayesian Inference for Linguistic Annotation Pipelines , 2006, EMNLP.

[16]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[17]  James R. Curran,et al.  Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models , 2007, Computational Linguistics.

[18]  David A. McAllester,et al.  The Generalized A* Architecture , 2007, J. Artif. Intell. Res..

[19]  Nikos Komodakis,et al.  MRF Optimization via Dual Decomposition: Message-Passing Revisited , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[20]  Mark Steedman,et al.  CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank , 2007, CL.

[21]  Taisuke Sato,et al.  Inside-Outside Probability Computation for Belief Propagation , 2007, IJCAI.

[22]  Qun Liu,et al.  A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging , 2008, ACL.

[23]  Smaranda Muresan,et al.  Generalizing Word Lattice Translation , 2008, ACL.

[24]  Haitao Mi,et al.  Forest-based Translation Rule Extraction , 2008, EMNLP.

[25]  Qun Liu,et al.  Forest-Based Translation , 2008, ACL.

[26]  Christopher D. Manning,et al.  Efficient, Feature-based, Conditional Random Field Parsing , 2008, ACL.

[27]  Mirella Lapata,et al.  Proceedings of ACL-08: HLT , 2008 .

[28]  Liang Huang,et al.  Forest Reranking: Discriminative Parsing with Non-Local Features , 2008, ACL.

[29]  David A. Smith,et al.  Dependency Parsing by Belief Propagation , 2008, EMNLP.

[30]  Christopher D. Manning,et al.  Joint Parsing and Named Entity Recognition , 2009, NAACL.

[31]  Markus Dreyer,et al.  Graphical Models over Multiple Strings , 2009, EMNLP.

[32]  Alexander M. Rush,et al.  On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing , 2010, EMNLP.

[33]  Gerald Penn,et al.  Accurate Context-Free Parsing with Combinatory Categorial Grammar , 2010, ACL.

[34]  James R. Curran,et al.  Faster Parsing by Supertagger Adaptation , 2010, ACL.

[35]  Eric P. Xing,et al.  Turbo Parsers: Dependency Parsing by Approximate Variational Inference , 2010, EMNLP.

[36]  Philip Resnik,et al.  Context-free reordering, finite-state translation , 2010, HLT-NAACL.

[37]  Alexander M. Rush,et al.  Dual Decomposition for Parsing with Non-Projective Head Automata , 2010, EMNLP.

[38]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..