Global Features for Shallow Discourse Parsing

A coherently related group of sentences may be referred to as a discourse. In this paper we address the problem of parsing coherence relations as defined in the Penn Discourse Tree Bank (PDTB). A good model for discourse structure analysis needs to account both for local dependencies at the token-level and for global dependencies and statistics. We present techniques on using inter-sentential or sentence-level (global), data-driven, non-grammatical features in the task of parsing discourse. The parser model follows up previous approach based on using token-level (local) features with conditional random fields for shallow discourse parsing, which is lacking in structural knowledge of discourse. The parser adopts a two-stage approach where first the local constraints are applied and then global constraints are used on a reduced weighted search space (n-best). In the latter stage we experiment with different rerankers trained on the first stage n-best parses, which are generated using lexico-syntactic local features. The two-stage parser yields significant improvements over the best performing model of discourse parser on the PDTB corpus.

[1]  Richard Johansson,et al.  End-to-End Discourse Parser Evaluation , 2011, 2011 IEEE Fifth International Conference on Semantic Computing.

[2]  Ani Nenkova,et al.  Using Syntax to Disambiguate Explicit Discourse Connectives in Text , 2009, ACL.

[3]  Daniel Marcu,et al.  NP Bracketing by Maximum Entropy Tagging and SVM Reranking , 2004, EMNLP.

[4]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[5]  Richard Johansson,et al.  Shallow Discourse Parsing with Conditional Random Fields , 2011, IJCNLP.

[6]  Richard Johansson,et al.  Syntactic and Semantic Structure for Opinion Expression Detection , 2010, CoNLL.

[7]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[8]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.

[9]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[10]  Daniel Marcu,et al.  Sentence Level Discourse Parsing using Syntactic and Lexical Information , 2003, NAACL.

[11]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[12]  Richard Johansson,et al.  Improving the Recall of a Discourse Parser by Constraint-based Postprocessing , 2012, LREC.

[13]  Mirella Lapata,et al.  Probabilistic Text Structuring: Experiments with Sentence Ordering , 2003, ACL.

[14]  Daniel Marcu,et al.  Discourse Generation Using Utility-Trained Coherence Models , 2006, ACL.

[15]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[16]  Michael Collins,et al.  Ranking Algorithms for Named Entity Extraction: Boosting and the VotedPerceptron , 2002, ACL.

[17]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[18]  B. D. Ripley COMPUTER-INTENSIVE STATISTICAL METHODS , 2011 .

[19]  David Chiang,et al.  Better k-best Parsing , 2005, IWPT.

[20]  FreundYoav,et al.  Large Margin Classification Using the Perceptron Algorithm , 1999 .

[21]  Julia Hirschberg,et al.  Some intonational characteristics of discourse structure , 1992, ICSLP.

[22]  Christopher D. Manning,et al.  A Global Joint Model for Semantic Role Labeling , 2008, CL.

[23]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[24]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[25]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[26]  Aravind K. Joshi,et al.  An SVM-based voting algorithm with application to parse reranking , 2003, CoNLL.

[27]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[28]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[29]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[30]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[31]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[32]  Robert Tibshirani,et al.  Computer‐Intensive Statistical Methods , 2006 .

[33]  Regina Barzilay,et al.  Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization , 2004, NAACL.

[34]  John A. Carroll,et al.  Applied morphological processing of English , 2001, Natural Language Engineering.

[35]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.