Unsupervised Parsing with S-DIORA: Single Tree Encoding for Deep Inside-Outside Recursive Autoencoders

The deep inside-outside recursive autoencoder (DIORA; Drozdov et al. 2019) is a self-supervised neural model that learns to induce syntactic tree structures for input sentences *without access to labeled training data*. In this paper, we discover that while DIORA exhaustively encodes all possible binary trees of a sentence with a soft dynamic program, its vector averaging approach is locally greedy and cannot recover from errors when computing the highest scoring parse tree in bottom-up chart parsing. To fix this issue, we introduce S-DIORA, an improved variant of DIORA that encodes a single tree rather than a softly-weighted mixture of trees by employing a hard argmax operation and a beam at each cell in the chart. Our experiments show that through *fine-tuning* a pre-trained DIORA with our new algorithm, we improve the state of the art in *unsupervised* constituency parsing on the English WSJ Penn Treebank by 2.2-6% F1, depending on the data used for fine-tuning.

[1]  Max Welling,et al.  Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement , 2019, ICML.

[2]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[3]  Fernando Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[4]  Naoki Kobayashi,et al.  Split or Merge: Which is Better for Unsupervised RST Parsing? , 2019, EMNLP.

[5]  Alexander M. Rush,et al.  Compound Probabilistic Context-Free Grammars for Grammar Induction , 2019, ACL.

[6]  Arthur Mensch,et al.  Differentiable Dynamic Programming for Structured Prediction and Attention , 2018, ICML.

[7]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[8]  Gideon S. Mann,et al.  Semi-supervised Learning of Dependency Parsers using Generalized Expectation Criteria , 2009, ACL/IJCNLP.

[9]  James R. Curran,et al.  Parsing Noun Phrases in the Penn Treebank , 2011, Computational Linguistics.

[10]  Jason Baldridge,et al.  Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models , 2011, ACL.

[11]  Kyunghyun Cho,et al.  Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models , 2020, ICLR.

[12]  Andrew McCallum,et al.  Linguistically-Informed Self-Attention for Semantic Role Labeling , 2018, EMNLP.

[13]  Ramón Fernández Astudillo,et al.  From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.

[14]  Jason Eisner,et al.  Inside-Outside and Forward-Backward Algorithms Are Just Backprop (tutorial paper) , 2016, SPNLP@EMNLP.

[15]  Dan Klein,et al.  A Minimal Span-Based Neural Constituency Parser , 2017, ACL.

[16]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[17]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[18]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[19]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[20]  Dan Klein,et al.  Prototype-Driven Grammar Induction , 2006, ACL.

[21]  Luke S. Zettlemoyer,et al.  Syntactic Scaffolds for Semantic Structures , 2018, EMNLP.

[22]  Ivan Titov,et al.  Differentiable Perturb-and-Parse: Semi-Supervised Parsing with a Structured Variational Autoencoder , 2018, ICLR.

[23]  Dana Angluin,et al.  Inductive Inference of Formal Languages from Positive Data , 1980, Inf. Control..

[24]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[25]  Regina Barzilay,et al.  Using Semantic Cues to Learn Syntax , 2011, AAAI.

[26]  Kevin Gimpel,et al.  Visually Grounded Neural Syntax Acquisition , 2019, ACL.

[27]  Glenn Carroll,et al.  Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .

[28]  Andrew McCallum,et al.  Unsupervised Labeled Parsing with Deep Inside-Outside Recursive Autoencoders , 2019, EMNLP.

[29]  Jihun Choi,et al.  Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction , 2020, ICLR.

[30]  Jason Eisner,et al.  Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing , 2017, TACL.

[31]  J. Baker Trainable grammars for speech recognition , 1979 .

[32]  Alexander M. Rush,et al.  A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing , 2012, J. Artif. Intell. Res..

[33]  Claire Cardie,et al.  SparseMAP: Differentiable Sparse Structured Inference , 2018, ICML.

[34]  Philip Resnik,et al.  Left-Corner Parsing and Psychological Plausibility , 1992, COLING.

[35]  Samuel R. Bowman,et al.  Do latent tree learning models identify meaningful structure in sentences? , 2017, TACL.

[36]  Dan Klein,et al.  Natural Language Grammar Induction Using a Constituent-Context Model , 2001, NIPS.

[37]  Mark Johnson,et al.  Using Universal Linguistic Knowledge to Guide Grammar Induction , 2010, EMNLP.

[38]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[39]  Kevin Gimpel,et al.  Weakly-Supervised Learning with Cost-Augmented Contrastive Estimation , 2014, EMNLP.

[40]  Rens Bod,et al.  An All-Subtrees Approach to Unsupervised Parsing , 2006, ACL.

[41]  Taylor L. Booth,et al.  Grammatical Inference: Introduction and Survey-Part II , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Dan Klein,et al.  Randomized Pruning: Efficiently Calculating Expectations in Large Dynamic Programs , 2009, NIPS.

[43]  Samuel R. Bowman,et al.  Self-Training for Unsupervised Parsing with PRPN , 2020, IWPT.

[44]  Omer Levy,et al.  Jointly Predicting Predicates and Arguments in Neural Semantic Role Labeling , 2018, ACL.

[45]  Michael I. Jordan,et al.  Probabilistic grammars and hierarchical Dirichlet processes , 2018, Oxford Handbooks Online.

[46]  Aaron C. Courville,et al.  Neural Language Modeling by Jointly Learning Syntax and Lexicon , 2017, ICLR.

[47]  Regina Barzilay,et al.  Unsupervised Multilingual Grammar Induction , 2009, ACL.

[48]  Mohit Yadav,et al.  Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Auto-Encoders , 2019, NAACL.

[49]  Dan Klein,et al.  Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output , 2012, EMNLP.

[50]  Aaron C. Courville,et al.  Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks , 2018, ICLR.

[51]  Brian Roark,et al.  Efficient probabilistic top-down and left-corner parsing , 1999, ACL.

[52]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[53]  Dan Klein,et al.  Constituency Parsing with a Self-Attentive Encoder , 2018, ACL.

[54]  Sanket Vaibhav Mehta,et al.  Gradient-Based Inference for Networks with Output Constraints , 2017, AAAI.

[55]  Chris Dyer,et al.  Syntactic Structure Distillation Pretraining for Bidirectional Encoders , 2020, Transactions of the Association for Computational Linguistics.

[56]  Noah A. Smith,et al.  Guiding Unsupervised Grammar Induction Using Contrastive Estimation , 2005 .

[57]  Yonatan Bisk,et al.  Probing the Linguistic Strengths and Limitations of Unsupervised Grammar Induction , 2015, ACL.

[58]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[59]  Philipp Koehn,et al.  Feature-Rich Statistical Translation of Noun Phrases , 2003, ACL.

[60]  James Cross,et al.  Span-Based Constituency Parsing with a Structure-Label System and Provably Optimal Dynamic Oracles , 2016, EMNLP.

[61]  Ivan Titov,et al.  Learning Latent Trees with Stochastic Perturbations and Differentiable Dynamic Programming , 2019, ACL.

[62]  Phong Le,et al.  Unsupervised Dependency Parsing: Let’s Use Supervised Parsers , 2015, NAACL.

[63]  Fernando Pereira,et al.  Case-factor diagrams for structured probabilistic modeling , 2004, J. Comput. Syst. Sci..