Co-training an Unsupervised Constituency Parser with Weak Supervision

We introduce a method for unsupervised parsing that relies on bootstrapping classifiers to identify if a node dominates a specific span in a sentence. There are two types of classifiers, an inside classifier that acts on a span, and an outside classifier that acts on everything outside of a given span. Through self-training and co-training with the two classifiers, we show that the interplay between them helps improve the accuracy of both, and as a result, effectively parse. A seed bootstrapping technique prepares the data to train these classifiers. Our analyses further validate that such an approach in conjunction with weak supervision using prior branching knowledge of a known language (left/right-branching) and minimal heuristics injects strong inductive bias into the parser, achieving 63.1 F1 on the English (PTB) test set. In addition, we show the effectiveness of our architecture by evaluating on treebanks for Chinese (CTB) and Japanese (KTB) and achieve new state-of-the-art results.1

[1]  Aaron C. Courville,et al.  Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks , 2018, ICLR.

[2]  Igor Malioutov,et al.  Learning Syntax from Naturally-Occurring Bracketings , 2021, NAACL.

[3]  Mark Steedman,et al.  Bootstrapping statistical parsers from small datasets , 2003, EACL.

[4]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[5]  Aaron C. Courville,et al.  Neural Language Modeling by Jointly Learning Syntax and Lexicon , 2017, ICLR.

[6]  Eric P. Xing,et al.  Spectral Unsupervised Parsing with Additive Tree Metrics , 2014, ACL.

[7]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[8]  M. A. R T A P A L,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[9]  Roy Schwartz,et al.  PaLM: A Hybrid Parser and Language Model , 2019, EMNLP.

[10]  J. Baker Trainable grammars for speech recognition , 1979 .

[11]  Samuel R. Bowman,et al.  Self-Training for Unsupervised Parsing with PRPN , 2020, IWPT.

[12]  Graham Neubig,et al.  The Return of Lexical Dependencies: Neural Lexicalized PCFGs , 2020, Transactions of the Association for Computational Linguistics.

[13]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[14]  Valentin I. Spitkovsky,et al.  Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction , 2013, EMNLP.

[15]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[16]  Hung-Yi Lee,et al.  Tree Transformer: Integrating Tree Structures into Self-Attention , 2019, EMNLP/IJCNLP.

[17]  Kevin Gimpel,et al.  On the Role of Supervision in Unsupervised Constituency Parsing , 2020, EMNLP.

[18]  Eugene Charniak,et al.  When is Self-Training Effective for Parsing? , 2008, COLING.