Text Parsing of a Complex Genre

A text parsing component designed to be part of a system that assists students in academic reading an writing is presented. The parser can automatically add a relational discourse structure annotation to a scientific article that a user wants to explore. The discourse structure employed is defined in an XML format and is based the Rhetorical Structure Theory. The architecture of the parser comprises pre-processing components which provide an input text with XML annotations on different linguistic and structural layers. In the first version these are syntactic tagging, lexical discourse marker tagging, logical document structure, and segmentation into elementary discourse segments. The algorithm is based on the shift-reduce parser by Marcu (2000) and is controlled by reduce operations that are constrained by linguistic conditions derived from an XML-encoded discourse marker lexicon. The constraints are formulated over multiple annotation layers of the same text.

[1]  Timo Järvinen,et al.  A non-projective dependency parser , 1997, ANLP.

[2]  Maki Watanabe,et al.  Discourse Tagging Reference Manual , 2001 .

[3]  Andreas Witt,et al.  Multiple hierarchies: new aspects of an old solution. Re-published , 2005 .

[4]  David Reitter,et al.  Step by step: underspecified markup in incremental rhetorical analysis , 2003, LINC@EACL.

[5]  Andreas Witt,et al.  Methods for the semantic analysis of document markup , 2003, DocEng '03.

[6]  David McKelvie,et al.  Hyperlink semantics for standoff markup of read-only documents , 1997 .

[7]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[8]  L. Vieu,et al.  Subordinating and coordinating discourse relations , 2005 .

[9]  Alex Lascarides,et al.  Logics of Conversation , 2005, Studies in natural language processing.

[10]  Eduard Hovy,et al.  Parsimonious or Profligate: How Many and Which Discourse Structure Relations? , 1992 .

[11]  Martin van den Berg,et al.  A Rule Based Approach to Discourse Parsing , 2004, SIGDIAL Workshop.

[12]  S. Corston-Oliver,et al.  Computing representations of the structure of written discourse , 1998 .

[13]  Alex Lascarides,et al.  Combining Hierarchical Clustering and Machine Learning to Predict High-Level Discourse Structure , 2004, COLING.

[14]  Andreas Witt,et al.  Unification of XML Documents with Concurrent Markup , 2005, Lit. Linguistic Comput..

[15]  Michael ODonnell,et al.  RSTTool 2.4 - A markup Tool for Rhetorical Structure Theory , 2000, INLG.

[16]  David Reitter,et al.  Simple Signals for Complex Rhetorics: On Rhetorical Analysis with Rich-Feature Support Vector Models , 2003, LDV Forum.

[17]  Christopher Culy,et al.  Sentential Structure and Discourse Parsing , 2004, ACL 2004.

[18]  李幼升,et al.  Ph , 1989 .

[19]  Norman J. Walsh,et al.  DocBook: The Definitive Guide , 1999 .

[20]  Donia Scott,et al.  Document Structure , 2003, CL.