A Supertag-Context Model for Weakly-Supervised CCG Parser Learning

Combinatory Categorial Grammar (CCG) is a lexicalized grammar formalism in which words are associated with categories that specify the syntactic configurations in which they may occur. We present a novel parsing model with the capacity to capture the associative adjacent-category relationships intrinsic to CCG by parameterizing the relationships between each constituent label and the preterminal categories directly to its left and right, biasing the model toward constituent categories that can combine with their contexts. This builds on the intuitions of Klein and Manning’s (2002) “constituentcontext” model, which demonstrated the value of modeling context, but has the advantage of being able to exploit the properties of CCG. Our experiments show that our model outperforms a baseline in which this context information is not captured.

[1]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[2]  Mark Steedman,et al.  Combinatory Categorial Grammar , 2011 .

[3]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[4]  Aline Villavicencio The acquisition of a unification-based generalised categorial grammar , 2002 .

[5]  Mark Steedman,et al.  A* CCG Parsing with a Supertag-factored Model , 2014, EMNLP.

[6]  Jason Baldridge Weakly Supervised Supertagging with Grammar-Informed Initialization , 2008, COLING.

[7]  Thomas L. Griffiths,et al.  Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.

[8]  Jason Baldridge,et al.  Non-Transformational Syntax: Formal and Explicit Models of Grammar , 2011 .

[9]  Mark Steedman,et al.  CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank , 2007, CL.

[10]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[11]  Cristina Bosco,et al.  Building a Treebank for Italian: a Data-driven Annotation Schema , 2000, LREC.

[12]  Joshua Goodman,et al.  Parsing Inside-Out , 1998, ArXiv.

[13]  Mark Steedman,et al.  Grammar Induction from Text Using Small Syntactic Prototypes , 2011, IJCNLP.

[14]  Cristina Bosco,et al.  Converting a dependency treebank to a categorial grammar treebank for Italian , 2009 .

[15]  Luke S. Zettlemoyer,et al.  Online Learning of Relaxed CCG Grammars for Parsing to Logical Form , 2007, EMNLP.

[16]  Noah A. Smith,et al.  Weakly-Supervised Bayesian Learning of a CCG Supertagger , 2014, CoNLL.

[17]  Jason Baldridge,et al.  Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries , 2012, EMNLP.

[18]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[19]  M. A. R T A P A L,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[20]  Yonatan Bisk,et al.  Simple Robust Grammar Induction with Combinatory Categorial Grammars , 2012, AAAI.

[21]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[22]  Zhiyi Chi,et al.  Statistical Properties of Probabilistic Context-Free Grammars , 1999, CL.

[23]  Radford,et al.  转换生成语法教程 = Transformational Grammar , 2000 .

[24]  Mark Johnson,et al.  Nonparametric bayesian models of lexical acquisition , 2007 .

[25]  Mark Johnson,et al.  Using Universal Linguistic Knowledge to Guide Grammar Induction , 2010, EMNLP.

[26]  Noah A. Smith,et al.  Weakly-Supervised Grammar-Informed Bayesian CCG Parser Learning , 2015, AAAI.

[27]  Yonatan Bisk,et al.  An HDP Model for Inducing Combinatory Categorial Grammars , 2013, TACL.

[28]  Martin Kay,et al.  Syntactic Process , 1979, ACL.

[29]  James R. Curran,et al.  Chinese CCGbank: extracting CCG derivations from the Penn Chinese Treebank , 2010, COLING.