Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis

We propose a novel approach for developing a two-stage document-level discourse parser. Our parser builds a discourse tree by applying an optimal parsing algorithm to probabilities inferred from two Conditional Random Fields: one for intrasentential parsing and the other for multisentential parsing. We present two approaches to combine these two stages of discourse parsing effectively. A set of empirical evaluations over two different datasets demonstrates that our discourse parser significantly outperforms the stateof-the-art, often by a wide margin.

[1]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[2]  Daniel Marcu,et al.  The rhetorical parsing of unrestricted texts: a surface-based approach , 2000, CL.

[3]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[4]  Shafiq R. Joty,et al.  A Novel Discriminative Framework for Sentence-Level Discourse Analysis , 2012, EMNLP.

[5]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[6]  Owen Rambow,et al.  Identifying Justifications in Written Dialogs by Classifying Text as Argumentative , 2011, Int. J. Semantic Comput..

[7]  Graeme Hirst,et al.  Text-level Discourse Parsing with Rich Linguistic Features , 2012, ACL.

[8]  Brian Roark,et al.  The utility of parse-derived features for automatic discourse segmentation , 2007, ACL.

[9]  Ani Nenkova,et al.  Discourse indicators for content selection in summarization , 2010, SIGDIAL Conference.

[10]  Manfred Stede,et al.  The Potsdam Commentary Corpus , 2004, ACL 2004.

[11]  Kathleen McKeown,et al.  Improving Word Sense Disambiguation in Lexical Chaining , 2003, IJCAI.

[12]  Mirella Lapata,et al.  Discourse Chunking and its Application to Sentence Compression , 2005, HLT.

[13]  Rashmi Prasad,et al.  The Penn Discourse TreeBank as a Resource for Natural Language Generation , 2005 .

[14]  Daniel Marcu,et al.  Sentence Level Discourse Parsing using Syntactic and Lexical Information , 2003, NAACL.

[15]  Gisela Redeker,et al.  Complex Sentences as Leaky Units in Discourse Parsing , 2011 .

[16]  Christian R. Huyck,et al.  Generating discourse structures for written texts , 2004, COLING 2004.

[17]  Alex Lascarides,et al.  Combining Hierarchical Clustering and Machine Learning to Predict High-Level Discourse Structure , 2004, COLING.

[18]  Daniel Marcu,et al.  An Unsupervised Approach to Recognizing Discourse Relations , 2002, ACL.

[19]  J. Wiebe,et al.  Discourse-level relations for opinion analysis , 2010 .

[20]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[21]  Mitsuru Ishizuka,et al.  HILDA: A Discourse Parser Using Support Vector Machine Classification , 2010, Dialogue Discourse.

[22]  Lou Boves,et al.  Evaluating discourse-based answer extraction for why-question answering , 2007, SIGIR.

[23]  Barbara Di Eugenio,et al.  An effective Discourse Parser that uses Rich Linguistic Information , 2009, NAACL.