Reducing Sparsity Improves the Recognition of Implicit Discourse Relations

The earliest work on automatic detection of implicit discourse relations relied on lexical features. More recently, researchers have demonstrated that syntactic features are superior to lexical features for the task. In this paper we re-examine the two classes of state of the art representations: syntactic production rules and word pair features. In particular, we focus on the need to reduce sparsity in instance representation, demonstrating that different representation choices even for the same class of features may exacerbate sparsity issues and reduce performance. We present results that clearly reveal that lexicalization of the syntactic features is necessary for good performance. We introduce a novel, less sparse, syntactic representation which leads to improvement in discourse relation recognition. Finally, we demonstrate that classifiers trained on different representations, especially lexical ones, behave rather differently and thus could likely be combined in future systems.

[1]  Claire Cardie,et al.  Improving Implicit Discourse Relation Recognition Through Feature Set Optimization , 2012, SIGDIAL Conference.

[2]  Jian Su,et al.  Predicting Discourse Connectives for Implicit Discourse Relation Recognition , 2010, COLING.

[3]  Hwee Tou Ng,et al.  A PDTB-styled end-to-end discourse parser , 2012, Natural Language Engineering.

[4]  Alex Lascarides,et al.  Edinburgh Research Explorer Using automatically labelled examples to classify rhetorical relations: an assessment , 2022 .

[5]  Hwee Tou Ng,et al.  Recognizing Implicit Discourse Relations in the Penn Discourse Treebank , 2009, EMNLP.

[6]  Ani Nenkova,et al.  Automatic sense prediction for implicit discourse relations in text , 2009, ACL.

[7]  Owen Rambow,et al.  Building and Refining Rhetorical-Semantic Relation Models , 2007, HLT-NAACL.

[8]  Danushka Bollegala,et al.  A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension , 2010, EMNLP.

[9]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[10]  Daniel Marcu,et al.  An Unsupervised Approach to Recognizing Discourse Relations , 2002, ACL.

[11]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[12]  Kathleen McKeown,et al.  Aggregated Word Pair Features for Implicit Discourse Relation Disambiguation , 2013, ACL.

[13]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[14]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.