Weakly-Supervised Bayesian Learning of a CCG Supertagger

We present a Bayesian formulation for weakly-supervised learning of a Combinatory Categorial Grammar (CCG) supertagger with an HMM. We assume supervision in the form of a tag dictionary, and our prior encourages the use of crosslinguistically common category structures as well as transitions between tags that can combine locally according to CCG’s combinators. Our prior is theoretically appealing since it is motivated by languageindependent, universal properties of the CCG formalism. Empirically, we show that it yields substantial improvements over previous work that used similar biases to initialize an EM-based learner. Additional gains are obtained by further shaping the prior with corpus-specific information that is extracted automatically from raw text and a tag dictionary.

[1]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[2]  Kevin Knight,et al.  Minimized Models for Unsupervised Part-of-Speech Tagging , 2009, ACL.

[3]  Mark Johnson,et al.  Improving nonparameteric Bayesian inference: experiments on unsupervised word segmentation with adaptor grammars , 2009, NAACL.

[4]  Yonatan Bisk,et al.  An HDP Model for Inducing Combinatory Categorial Grammars , 2013, TACL.

[5]  Cristina Bosco,et al.  Building a Treebank for Italian: a Data-driven Annotation Schema , 2000, LREC.

[6]  Luke S. Zettlemoyer,et al.  Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.

[7]  Jason Baldridge,et al.  Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries , 2012, EMNLP.

[8]  Jason Baldridge,et al.  Learning a Part-of-Speech Tagger from Two Hours of Annotation , 2013, NAACL.

[9]  M. Steedman,et al.  Combinatory Categorial Grammar , 2011 .

[10]  Regina Barzilay,et al.  Simple Type-Level Unsupervised POS Tagging , 2010, EMNLP.

[11]  Slav Petrov,et al.  Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections , 2011, ACL.

[12]  Julian M. Kupiec,et al.  Robust part-of-speech tagging using a hidden Markov model , 1992 .

[13]  Jason Baldridge,et al.  Non-Transformational Syntax: Formal and Explicit Models of Grammar , 2011 .

[14]  Mark Steedman,et al.  CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank , 2007, CL.

[15]  Yoav Goldberg,et al.  EM Can Find Pretty Good HMM POS-Taggers (When Given a Good Start) , 2008, ACL.

[16]  M. A. R T A P A L,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[17]  Martin Kay,et al.  Syntactic Process , 1979, ACL.

[18]  Phil Blunsom,et al.  A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction , 2011, ACL.

[19]  Thomas L. Griffiths,et al.  A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[20]  Srinivas Bangalore,et al.  Supertagging: An Approach to Almost Parsing , 1999, CL.

[21]  John DeNero,et al.  Painless Unsupervised Learning with Features , 2010, NAACL.

[22]  Jason Baldridge,et al.  Real-World Semi-Supervised Learning of POS-Taggers for Low-Resource Languages , 2013, ACL.

[23]  Yee Whye Teh,et al.  Beam sampling for the infinite hidden Markov model , 2008, ICML '08.

[24]  Mark Johnson,et al.  Why Doesn’t EM Find Good HMM POS-Taggers? , 2007, EMNLP.

[25]  R. Kohn,et al.  On Gibbs sampling for state space models , 1994 .

[26]  Zhiyi Chi,et al.  Statistical Properties of Probabilistic Context-Free Grammars , 1999, CL.

[27]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[28]  Zoubin Ghahramani,et al.  The infinite HMM for unsupervised PoS tagging , 2009, EMNLP.

[29]  James R. Curran,et al.  Chinese CCGbank: extracting CCG derivations from the Penn Chinese Treebank , 2010, COLING.

[30]  Adam Lopez,et al.  Using Categorial Grammar to Label Translation Rules , 2012, WMT@NAACL-HLT.

[31]  Jason Baldridge Weakly Supervised Supertagging with Grammar-Informed Initialization , 2008, COLING.

[32]  Jason Baldridge,et al.  Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons , 2010, ACL.

[33]  Cristina Bosco,et al.  Converting a dependency treebank to a categorial grammar treebank for Italian , 2009 .

[34]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.