Supertagging with CCG primitives

In CCG and other highly lexicalized grammars, supertagging a sentence’s words with their lexical categories is a critical step for efficient parsing. Because of the high degree of lexicalization in these grammars, the lexical categories can be very complex. Existing approaches to supervised CCG supertagging treat the categories as atomic units, even when the categories are not simple; when they encounter words with categories unseen during training, their guesses are accordingly unsophisticated. In this paper, we make use of the primitives and operators that constitute the lexical categories of categorial grammars. Instead of opaque labels, we treat lexical categories themselves as linear sequences. We present an LSTM-based model that replaces standard word-level classification with prediction of a sequence of primitives, similarly to LSTM decoders. Our model obtains state-of-the-art word accuracy for single-task English CCG supertagging, increases parser coverage and F1, and is able to produce novel categories. Analysis shows a synergistic effect between this decomposed view and incorporation of prediction history.

[1]  Stephen Clark,et al.  CCG Supertagging with a Recurrent Neural Network , 2015, ACL.

[2]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Ashish Vaswani,et al.  Supertagging With LSTMs , 2016, NAACL.

[5]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[6]  Mark Steedman,et al.  CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank , 2007, CL.

[7]  Noah A. Smith,et al.  Weakly-Supervised Bayesian Learning of a CCG Supertagger , 2014, CoNLL.

[8]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[9]  Stephen Clark,et al.  Adapting a Lexicalized-Grammar Parser to Contrasting Domains , 2008, EMNLP.

[10]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[11]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[12]  James R. Curran,et al.  The Importance of Supertagging for Wide-Coverage CCG Parsing , 2004, COLING.

[13]  Srinivas Bangalore,et al.  Supertagging: An Approach to Almost Parsing , 1999, CL.

[14]  Stephen Clark,et al.  Expected F-Measure Training for Shift-Reduce Parsing with Recurrent Neural Networks , 2016, HLT-NAACL.

[15]  Jason Baldridge Weakly Supervised Supertagging with Grammar-Informed Initialization , 2008, COLING.

[16]  Geoffrey E. Hinton,et al.  Grammar as a Foreign Language , 2014, NIPS.

[17]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[18]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[19]  Mark Steedman,et al.  Improved CCG Parsing with Semi-supervised Supertagging , 2014, TACL.

[20]  Yonatan Bisk,et al.  Simple Robust Grammar Induction with Combinatory Categorial Grammars , 2012, AAAI.

[21]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[22]  Yonatan Bisk,et al.  An HDP Model for Inducing Combinatory Categorial Grammars , 2013, TACL.

[23]  Stephen Clark,et al.  Supertagging for Combinatory Categorial Grammar , 2002, TAG+.

[24]  Barbara Plank,et al.  Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss , 2016, ACL.

[25]  Quoc V. Le,et al.  Semi-Supervised Sequence Modeling with Cross-View Training , 2018, EMNLP.

[26]  Michael Moortgat,et al.  Constructive Type-Logical Supertagging With Self-Attention Networks , 2019, RepL4NLP@ACL.

[27]  Srinivas Bangalore,et al.  The Institute For Research In Cognitive Science Disambiguation of Super Parts of Speech ( or Supertags ) : Almost Parsing by Aravind , 1995 .

[28]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[29]  S. Clark,et al.  The Java Version of the C & C Parser Version 0 . 95 , 2015 .

[30]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[31]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[32]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[33]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[34]  Jungo Kasai,et al.  TAG Parsing with Neural Networks and Vector Representations of Supertags , 2017, EMNLP.

[35]  Luke S. Zettlemoyer,et al.  LSTM CCG Parsing , 2016, NAACL.

[36]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Beatrice Santorini,et al.  The Penn Treebank: An Overview , 2003 .

[38]  Jiajun Zhang,et al.  A Dynamic Window Neural Network for CCG Supertagging , 2017, AAAI.