Unsupervised Discourse Constituency Parsing Using Viterbi EM

In this paper, we introduce an unsupervised discourse constituency parsing algorithm. We use Viterbi EM with a margin-based criterion to train a span-based discourse parser in an unsupervised manner. We also propose initialization methods for Viterbi training of discourse constituents based on our prior knowledge of text structures. Experimental results demonstrate that our unsupervised parser achieves comparable or even superior performance to fully supervised parsers. We also investigate discourse constituents that are learned by our method.

[1]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[2]  Hiyan Alshawi,et al.  Head Automata and Bilingual Tiling: Translation with Minimal Representations , 1996, ACL.

[3]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[5]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[6]  Andrew Radford,et al.  Transformational Grammar: Contents , 1988 .

[7]  Yizhong Wang,et al.  Toward Fast and Accurate Neural Discourse Segmentation , 2018, EMNLP.

[8]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[9]  Karen Kukich,et al.  Evaluation of text coherence for electronic essay scoring systems , 2004, Natural Language Engineering.

[10]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[11]  John DeNero,et al.  Painless Unsupervised Learning with Features , 2010, NAACL.

[12]  Shafiq R. Joty,et al.  CODRA: A Novel Discriminative Framework for Rhetorical Analysis , 2015, CL.

[13]  Claire Cardie,et al.  Structured Local Training and Biased Potential Functions for Conditional Random Fields with Application to Coreference Resolution , 2007, HLT-NAACL.

[14]  John DeNero,et al.  A Feature-Rich Constituent Context Model for Grammar Induction , 2012, ACL.

[15]  Parminder Bhatia,et al.  Better Document-level Sentiment Analysis from RST Discourse Parsing , 2015, EMNLP.

[16]  Yoav Seginer,et al.  Fast Unsupervised Incremental Parsing , 2007, ACL.

[17]  Kenji Sagae,et al.  Analysis of Discourse Structure with Syntactic Dependencies and Data-Driven Shift-Reduce Parsing , 2009, IWPT.

[18]  Liang Wang,et al.  Text-level Discourse Dependency Parsing , 2014, ACL.

[19]  Mitsuru Ishizuka,et al.  HILDA: A Discourse Parser Using Support Vector Machine Classification , 2010, Dialogue Discourse.

[20]  Dan Klein,et al.  Distributional phrase structure induction , 2001, CoNLL.

[21]  Mark Johnson,et al.  Representational Bias in Unsupervised Learning of Syllable Structure , 2005, CoNLL.

[22]  Nicholas Asher,et al.  A Dependency Perspective on RST Discourse Parsing and Evaluation , 2018, CL.

[23]  Dan Klein,et al.  What’s Going On in Neural Constituency Parsers? An Analysis , 2018, NAACL.

[24]  Dan Klein,et al.  A Minimal Span-Based Neural Constituency Parser , 2017, ACL.

[25]  D. Marcu,et al.  Experiments in Constructing a Corpus of Discourse Trees : Problems , Annotation Choices , Issues , 1999 .

[26]  Masaaki Nagata,et al.  Dependency-based Discourse Parser for Single-Document Summarization , 2014, EMNLP.

[27]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[28]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[29]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[30]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[31]  Nicholas Asher,et al.  How much progress have we made on RST discourse parsing? A replication study of recent results on the RST-DT , 2017, EMNLP.

[32]  J. Baker Trainable grammars for speech recognition , 1979 .

[33]  Naoki Kobayashi,et al.  Split or Merge: Which is Better for Unsupervised RST Parsing? , 2019, EMNLP.

[34]  Kewei Tu,et al.  Unsupervised Neural Dependency Parsing , 2016, EMNLP.

[35]  Noah A. Smith,et al.  Concavity and Initialization for Unsupervised Dependency Parsing , 2012, NAACL.

[36]  Lane Schwartz,et al.  Unsupervised Grammar Induction with Depth-bounded PCFG , 2018, TACL.

[37]  Noah A. Smith,et al.  Annealing Structural Bias in Multilingual Weighted Grammar Induction , 2006, ACL.

[38]  Noah A. Smith,et al.  Novel estimation methods for unsupervised discovery of latent structure in natural language text , 2007 .

[39]  Ani Nenkova,et al.  Discourse indicators for content selection in summarization , 2010, SIGDIAL Conference.

[40]  Martin van den Berg,et al.  Discourse Structure and Sentiment , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[41]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[42]  Shafiq R. Joty,et al.  Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis , 2013, ACL.

[43]  Livia Polanyi,et al.  A Theory of Discourse Structure and Discourse Coherence in Papers from the General Session at the Twenty-First Regional Meeting. , 1985 .

[44]  Christopher D. Manning,et al.  The unsupervised learning of natural language structure , 2005 .

[45]  Jacob Eisenstein,et al.  Representation Learning for Text-level Discourse Parsing , 2014, ACL.

[46]  Dan Klein,et al.  Natural Language Grammar Induction Using a Constituent-Context Model , 2001, NIPS.

[47]  Mark Johnson,et al.  Using Universal Linguistic Knowledge to Guide Grammar Induction , 2010, EMNLP.

[48]  Eugene Charniak,et al.  Reranking and Self-Training for Parser Adaptation , 2006, ACL.

[49]  Graeme Hirst,et al.  A Linear-Time Bottom-Up Discourse Parser with Constraints and Post-Editing , 2014, ACL.

[50]  Radford,et al.  转换生成语法教程 = Transformational Grammar , 2000 .

[51]  Christopher D. Manning,et al.  Corpus-Based Induction of Syntactic Structure : Models of Constituency and Dependency , 2004 .

[52]  Glenn Carroll,et al.  Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .

[53]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[54]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.