Cross-lingual RST Discourse Parsing

Discourse parsing is an integral part of understanding information flow and argumentative structure in documents. Most previous research has focused on inducing and evaluating models from the English RST Discourse Treebank. However, discourse treebanks for other languages exist, including Spanish, German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same underlying linguistic theory, but differ slightly in the way documents are annotated. In this paper, we present (a) a new discourse parser which is simpler, yet competitive (significantly better on 2/3 metrics) to state of the art for English, (b) a harmonization of discourse treebanks across languages, enabling us to present (c) what to the best of our knowledge are the first experiments on cross-lingual discourse parsing.

[1]  Ani Nenkova,et al.  Discourse indicators for content selection in summarization , 2010, SIGDIAL Conference.

[2]  Eduard H. Hovy,et al.  Recursive Deep Models for Discourse Parsing , 2014, EMNLP.

[3]  Manfred Stede,et al.  Potsdam Commentary Corpus 2.0: Annotation for Discourse Research , 2014, LREC.

[4]  Maite Taboada,et al.  A qualitative comparison method for rhetorical structures: identifying different discourse structures in multilingual corpora , 2015, Lang. Resour. Evaluation.

[5]  Oier Lopez de Lacalle,et al.  The RST Basque TreeBank : an online search interface to check rhetorical relations , 2013 .

[6]  Eric SanJuan,et al.  DiSeg: Un segmentador discursivo automático para el español , 2010, Proces. del Leng. Natural.

[7]  Gosse Bouma,et al.  Multi-Layer Discourse Annotation of a Dutch Text Corpus , 2012, LREC.

[8]  Daniel Marcu,et al.  Sentence Level Discourse Parsing using Syntactic and Lexical Information , 2003, NAACL.

[9]  Eric SanJuan,et al.  DiZer 2.0 - An Adaptable On-line Discourse Parser , 2011 .

[10]  Daniel Marcu,et al.  From discourse structures to text summaries , 1997 .

[11]  Shafiq R. Joty,et al.  A Novel Discriminative Framework for Sentence-Level Discourse Analysis , 2012, EMNLP.

[12]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[13]  Graeme Hirst,et al.  Text-level Discourse Parsing with Rich Linguistic Features , 2012, ACL.

[14]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[15]  Charlotte Roze Vers une algèbre des relations de discours , 2013 .

[16]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[17]  Veronika Laippala,et al.  Universal Dependencies 1.4 , 2015 .

[18]  Maria das Graças Volpe Nunes,et al.  On the Development and Evaluation of a Brazilian Portuguese Discourse Parser , 2008, RITA.

[19]  Shafiq R. Joty,et al.  Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis , 2013, ACL.

[20]  Mirella Lapata,et al.  Discourse Chunking and its Application to Sentence Compression , 2005, HLT.

[21]  Slav Petrov,et al.  Multi-Source Transfer of Delexicalized Dependency Parsers , 2011, EMNLP.

[22]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[23]  Graeme Hirst,et al.  Adaptation of Discourse Parsing Models for the Portuguese Language , 2015, 2015 Brazilian Conference on Intelligent Systems (BRACIS).

[24]  Maite Taboada,et al.  Applications of Rhetorical Structure Theory , 2006 .

[25]  Pascal Denis,et al.  Comparing Word Representations for Implicit Discourse Relation Classification , 2015, EMNLP.

[26]  Daniel Marcu,et al.  A Noisy-Channel Model for Document Compression , 2002, ACL.

[27]  Omer Levy,et al.  A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments , 2016, EACL.

[28]  Gerardo Sierra,et al.  On the Development of the RST Spanish Treebank , 2011, Linguistic Annotation Workshop.

[29]  Manfred Stede,et al.  The Potsdam Commentary Corpus , 2004, ACL 2004.

[30]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[31]  Nianwen Xue,et al.  Discovering Implicit Discourse Relations Through Brown Cluster Pair Representation and Coreference Patterns , 2014, EACL.

[32]  Daniel Marcu,et al.  Finding the WRITE Stuff: Automatic Identification of Discourse Structure in Student Essays , 2003, IEEE Intell. Syst..

[33]  Barbara Di Eugenio,et al.  An effective Discourse Parser that uses Rich Linguistic Information , 2009, NAACL.

[34]  Jacob Eisenstein,et al.  Representation Learning for Text-level Discourse Parsing , 2014, ACL.

[35]  Gosse Bouma,et al.  Building a Discourse-annotated Dutch Text Corpus , 2011 .

[36]  Eric SanJuan,et al.  DiSeg 1.0: The first system for Spanish discourse segmentation , 2012, Expert Syst. Appl..

[37]  G. Meade Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001 .

[38]  Ranjani Parthasarathi,et al.  An Approach to Discourse Parsing using Sangati and Rhetorical Structure Theory , 2012 .

[39]  Alon Lavie,et al.  A Classifier-Based Parser with Linear Run-Time Complexity , 2005, IWPT.

[40]  Maki Watanabe,et al.  Discourse Tagging Reference Manual , 2001 .

[41]  Mitsuru Ishizuka,et al.  HILDA: A Discourse Parser Using Support Vector Machine Classification , 2010, Dialogue Discourse.

[42]  Parminder Bhatia,et al.  Better Document-level Sentiment Analysis from RST Discourse Parsing , 2015, EMNLP.

[43]  Kenji Sagae,et al.  Analysis of Discourse Structure with Syntactic Dependencies and Data-Driven Shift-Reduce Parsing , 2009, IWPT.

[44]  Ani Nenkova,et al.  Automatic sense prediction for implicit discourse relations in text , 2009, ACL.

[45]  Christopher Culy,et al.  Hybrid Text Summarization: Combining External Relevance Measures with Structural Analysis , 2004 .

[46]  Erick Galani Maziero,et al.  CSTNews - A Discourse-Annotated Corpus for Single and Multi-Document Summarization of News Texts in Brazilian Portuguese , 2011 .

[47]  Daniel Marcu,et al.  The rhetorical parsing of unrestricted texts: a surface-based approach , 2000, CL.

[48]  Daniel Marcu,et al.  Evaluating Multiple Aspects of Coherence in Student Essays , 2004, NAACL.

[49]  Maximin Coavoux,et al.  Neural Greedy Constituent Parsing with Dynamic Oracles , 2016, ACL.

[50]  David Yarowsky,et al.  Cross-lingual Dependency Parsing Based on Distributed Representations , 2015, ACL.

[51]  Maite Taboada,et al.  Mapping Different Rhetorical Relation Annotations: A Proposal , 2015, *SEM@NAACL-HLT.

[52]  Graeme Hirst,et al.  A Linear-Time Bottom-Up Discourse Parser with Constraints and Post-Editing , 2014, ACL.

[53]  Lv Xueqiang,et al.  A New Ranking Method for Chinese Discourse Tree Building , 2015 .