Multi-view and multi-task training of RST discourse parsers

We experiment with different ways of training LSTM networks to predict RST discourse trees. The main challenge for RST discourse parsing is the limited amounts of training data. We combat this by regularizing our models using task supervision from related tasks as well as alternative views on discourse structures. We show that a simple LSTM sequential discourse parser takes advantage of this multi-view and multi-task framework with 12-15% error reductions over our baseline (depending on the metric) and results that rival more complex state-of-the-art parsers.

[1]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[2]  Christian R. Huyck,et al.  Generating Discourse Structures for Written Text , 2004, COLING.

[3]  Daniel Marcu,et al.  Evaluating Multiple Aspects of Coherence in Student Essays , 2004, NAACL.

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Wei Xu,et al.  End-to-end learning of semantic role labeling using recurrent neural networks , 2015, ACL.

[6]  Daniel Marcu,et al.  Finding the WRITE Stuff: Automatic Identification of Discourse Structure in Student Essays , 2003, IEEE Intell. Syst..

[7]  Zheng-Yu Niu,et al.  Leveraging Synthetic Discourse Data via Multi-task Learning for Implicit Discourse Relation Recognition , 2013, ACL.

[8]  Nianwen Xue,et al.  Discovering Implicit Discourse Relations Through Brown Cluster Pair Representation and Coreference Patterns , 2014, EACL.

[9]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[10]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[11]  Shafiq R. Joty,et al.  A Novel Discriminative Framework for Sentence-Level Discourse Analysis , 2012, EMNLP.

[12]  Graeme Hirst,et al.  Text-level Discourse Parsing with Rich Linguistic Features , 2012, ACL.

[13]  Akira Shimazu,et al.  A Reranking Model for Discourse Segmentation using Subtree Features , 2012, SIGDIAL Conference.

[14]  Ming Yang,et al.  Bidirectional Long Short-Term Memory Networks for Relation Classification , 2015, PACLIC.

[15]  Fuzhen Zhuang,et al.  Shared Structure Learning for Multiple Tasks with Multiple Views , 2013, ECML/PKDD.

[16]  Steven Skiena,et al.  Polyglot: Distributed Word Representations for Multilingual NLP , 2013, CoNLL.

[17]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[18]  Manfred Stede,et al.  Discourse Processing , 2011, NAACL.

[19]  Mirella Lapata,et al.  Discourse Chunking and its Application to Sentence Compression , 2005, HLT.

[20]  Graeme Hirst,et al.  A Linear-Time Bottom-Up Discourse Parser with Constraints and Post-Editing , 2014, ACL.

[21]  Barbara Plank,et al.  Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss , 2016, ACL.

[22]  Mitsuru Ishizuka,et al.  HILDA: A Discourse Parser Using Support Vector Machine Classification , 2010, Dialogue Discourse.

[23]  Jacob Eisenstein,et al.  One Vector is Not Enough: Entity-Augmented Distributed Semantics for Discourse Relations , 2014, TACL.

[24]  Daniel Marcu,et al.  The rhetorical parsing, summarization, and generation of natural language texts , 1998 .

[25]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[26]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.

[27]  Liang Wang,et al.  Text-level Discourse Dependency Parsing , 2014, ACL.

[28]  Christopher Culy,et al.  Hybrid Text Summarization: Combining External Relevance Measures with Structural Analysis , 2004 .

[29]  Sigrid Klerke,et al.  Improving sentence compression by learning to predict gaze , 2016, NAACL.

[30]  Alex Lascarides,et al.  Logics of Conversation , 2005, Studies in natural language processing.

[31]  Maite Taboada,et al.  Applications of Rhetorical Structure Theory , 2006 .

[32]  Ani Nenkova,et al.  Discourse indicators for content selection in summarization , 2010, SIGDIAL Conference.

[33]  Lynnelle Rhinier Brown,et al.  Requesting the Context: A Context Analysis of Let Statement and If Statement Requests and Commands in the Santa Barbara Corpus of Spoken American English , 2014 .

[34]  Pascal Denis,et al.  Constrained Decoding for Text-Level Discourse Parsing , 2012, COLING.

[35]  Parminder Bhatia,et al.  Better Document-level Sentiment Analysis from RST Discourse Parsing , 2015, EMNLP.

[36]  James Pustejovsky,et al.  Temporal and Event Information in Natural Language Text , 2005, Lang. Resour. Evaluation.

[37]  Ani Nenkova,et al.  Automatic sense prediction for implicit discourse relations in text , 2009, ACL.

[38]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[39]  Daniel Marcu,et al.  The rhetorical parsing of unrestricted texts: a surface-based approach , 2000, CL.

[40]  James Pustejovsky,et al.  FactBank: a corpus annotated with event factuality , 2009, Lang. Resour. Evaluation.

[41]  Jacob Eisenstein,et al.  Representation Learning for Text-level Discourse Parsing , 2014, ACL.

[42]  Owen Rambow,et al.  Discourse Relations and Propositional Attitudes , 2011 .

[43]  Maite Taboada,et al.  Annotation upon Annotation: Adding Signalling Information to a Corpus of Discourse Relations , 2013, Dialogue Discourse.

[44]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[45]  Christian R. Huyck,et al.  Generating discourse structures for written texts , 2004, COLING 2004.

[46]  Hwee Tou Ng,et al.  Recognizing Implicit Discourse Relations in the Penn Discourse Treebank , 2009, EMNLP.

[47]  Daniel Marcu,et al.  A Noisy-Channel Model for Document Compression , 2002, ACL.