论文信息 - Unsupervised Parsing with S-DIORA: Single Tree Encoding for Deep Inside-Outside Recursive Autoencoders - 字舞流文

Unsupervised Parsing with S-DIORA: Single Tree Encoding for Deep Inside-Outside Recursive Autoencoders

The deep inside-outside recursive autoencoder (DIORA; Drozdov et al. 2019) is a self-supervised neural model that learns to induce syntactic tree structures for input sentences *without access to labeled training data*. In this paper, we discover that while DIORA exhaustively encodes all possible binary trees of a sentence with a soft dynamic program, its vector averaging approach is locally greedy and cannot recover from errors when computing the highest scoring parse tree in bottom-up chart parsing. To fix this issue, we introduce S-DIORA, an improved variant of DIORA that encodes a single tree rather than a softly-weighted mixture of trees by employing a hard argmax operation and a beam at each cell in the chart. Our experiments show that through *fine-tuning* a pre-trained DIORA with our new algorithm, we improve the state of the art in *unsupervised* constituency parsing on the English WSJ Penn Treebank by 2.2-6% F1, depending on the data used for fine-tuning.

Mohit Iyyer | Andrew McCallum | Subendhu Rongali | Tim O’Gorman | Andrew Drozdov | Yi-Pei Chen | Andrew Drozdov | A. McCallum | Mohit Iyyer | Subendhu Rongali | Timothy J. O'Gorman | Yi-Pei Chen

[1] Max Welling,et al. Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement , 2019, ICML.

[2] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[3] Fernando Pereira,et al. Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[4] Naoki Kobayashi,et al. Split or Merge: Which is Better for Unsupervised RST Parsing? , 2019, EMNLP.

[5] Alexander M. Rush,et al. Compound Probabilistic Context-Free Grammars for Grammar Induction , 2019, ACL.

[6] Arthur Mensch,et al. Differentiable Dynamic Programming for Structured Prediction and Attention , 2018, ICML.

[7] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.

[8] Gideon S. Mann,et al. Semi-supervised Learning of Dependency Parsers using Generalized Expectation Criteria , 2009, ACL/IJCNLP.

[9] James R. Curran,et al. Parsing Noun Phrases in the Penn Treebank , 2011, Computational Linguistics.

[10] Jason Baldridge,et al. Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models , 2011, ACL.

[11] Kyunghyun Cho,et al. Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models , 2020, ICLR.

[12] Andrew McCallum,et al. Linguistically-Informed Self-Attention for Semantic Role Labeling , 2018, EMNLP.

[13] Ramón Fernández Astudillo,et al. From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.

[14] Jason Eisner,et al. Inside-Outside and Forward-Backward Algorithms Are Just Backprop (tutorial paper) , 2016, SPNLP@EMNLP.

[15] Dan Klein,et al. A Minimal Span-Based Neural Constituency Parser , 2017, ACL.

[16] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[17] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[18] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[19] Eugene Charniak,et al. Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[20] Dan Klein,et al. Prototype-Driven Grammar Induction , 2006, ACL.

[21] Luke S. Zettlemoyer,et al. Syntactic Scaffolds for Semantic Structures , 2018, EMNLP.

[22] Ivan Titov,et al. Differentiable Perturb-and-Parse: Semi-Supervised Parsing with a Structured Variational Autoencoder , 2018, ICLR.

[23] Dana Angluin,et al. Inductive Inference of Formal Languages from Positive Data , 1980, Inf. Control..

[24] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[25] Regina Barzilay,et al. Using Semantic Cues to Learn Syntax , 2011, AAAI.

[26] Kevin Gimpel,et al. Visually Grounded Neural Syntax Acquisition , 2019, ACL.

[27] Glenn Carroll,et al. Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .

[28] Andrew McCallum,et al. Unsupervised Labeled Parsing with Deep Inside-Outside Recursive Autoencoders , 2019, EMNLP.

[29] Jihun Choi,et al. Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction , 2020, ICLR.

[30] Jason Eisner,et al. Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing , 2017, TACL.

[31] J. Baker. Trainable grammars for speech recognition , 1979 .

[32] Alexander M. Rush,et al. A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing , 2012, J. Artif. Intell. Res..

[33] Claire Cardie,et al. SparseMAP: Differentiable Sparse Structured Inference , 2018, ICML.

[34] Philip Resnik,et al. Left-Corner Parsing and Psychological Plausibility , 1992, COLING.

[35] Samuel R. Bowman,et al. Do latent tree learning models identify meaningful structure in sentences? , 2017, TACL.

[36] Dan Klein,et al. Natural Language Grammar Induction Using a Constituent-Context Model , 2001, NIPS.

[37] Mark Johnson,et al. Using Universal Linguistic Knowledge to Guide Grammar Induction , 2010, EMNLP.

[38] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[39] Kevin Gimpel,et al. Weakly-Supervised Learning with Cost-Augmented Contrastive Estimation , 2014, EMNLP.

[40] Rens Bod,et al. An All-Subtrees Approach to Unsupervised Parsing , 2006, ACL.

[41] Taylor L. Booth,et al. Grammatical Inference: Introduction and Survey-Part II , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42] Dan Klein,et al. Randomized Pruning: Efficiently Calculating Expectations in Large Dynamic Programs , 2009, NIPS.

[43] Samuel R. Bowman,et al. Self-Training for Unsupervised Parsing with PRPN , 2020, IWPT.

[44] Omer Levy,et al. Jointly Predicting Predicates and Arguments in Neural Semantic Role Labeling , 2018, ACL.

[45] Michael I. Jordan,et al. Probabilistic grammars and hierarchical Dirichlet processes , 2018, Oxford Handbooks Online.

[46] Aaron C. Courville,et al. Neural Language Modeling by Jointly Learning Syntax and Lexicon , 2017, ICLR.

[47] Regina Barzilay,et al. Unsupervised Multilingual Grammar Induction , 2009, ACL.

[48] Mohit Yadav,et al. Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Auto-Encoders , 2019, NAACL.

[49] Dan Klein,et al. Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output , 2012, EMNLP.

[50] Aaron C. Courville,et al. Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks , 2018, ICLR.

[51] Brian Roark,et al. Efficient probabilistic top-down and left-corner parsing , 1999, ACL.

[52] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[53] Dan Klein,et al. Constituency Parsing with a Self-Attentive Encoder , 2018, ACL.

[54] Sanket Vaibhav Mehta,et al. Gradient-Based Inference for Networks with Output Constraints , 2017, AAAI.

[55] Chris Dyer,et al. Syntactic Structure Distillation Pretraining for Bidirectional Encoders , 2020, Transactions of the Association for Computational Linguistics.

[56] Noah A. Smith,et al. Guiding Unsupervised Grammar Induction Using Contrastive Estimation , 2005 .

[57] Yonatan Bisk,et al. Probing the Linguistic Strengths and Limitations of Unsupervised Grammar Induction , 2015, ACL.

[58] Noah A. Smith,et al. Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[59] Philipp Koehn,et al. Feature-Rich Statistical Translation of Noun Phrases , 2003, ACL.

[60] James Cross,et al. Span-Based Constituency Parsing with a Structure-Label System and Provably Optimal Dynamic Oracles , 2016, EMNLP.

[61] Ivan Titov,et al. Learning Latent Trees with Stochastic Perturbations and Differentiable Dynamic Programming , 2019, ACL.

[62] Phong Le,et al. Unsupervised Dependency Parsing: Let’s Use Supervised Parsers , 2015, NAACL.

[63] Fernando Pereira,et al. Case-factor diagrams for structured probabilistic modeling , 2004, J. Comput. Syst. Sci..