论文信息 - Addressing Class Imbalance for Improved Recognition of Implicit Discourse Relations

Addressing Class Imbalance for Improved Recognition of Implicit Discourse Relations

In this paper we address the problem of skewed class distribution in implicit discourse relation recognition. We examine the performance of classifiers for both binary classification predicting if a particular relation holds or not and for multi-class prediction. We review prior work to point out that the problem has been addressed differently for the binary and multi-class problems. We demonstrate that adopting a unified approach can significantly improve the performance of multi-class prediction. We also propose an approach that makes better use of the full annotations in the training set when downsampling is used. We report significant absolute improvements in performance in multi-class prediction, as well as significant improvement of binary classifiers for detecting the presence of implicit Temporal, Comparison and Contingency relations.

Junyi Jessy Li | Ani Nenkova | A. Nenkova

[1] Hwee Tou Ng,et al. A PDTB-styled end-to-end discourse parser , 2012, Natural Language Engineering.

[2] Gustavo E. A. P. A. Batista,et al. A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[3] Guodong Zhou,et al. Cross-argument inference for implicit discourse relation recognition , 2012, CIKM '12.

[4] Jason Baldridge,et al. Discourse Connective Argument Identification with Connective Specific Rankers , 2008, 2008 IEEE International Conference on Semantic Computing.

[5] George Forman,et al. Feature shaping for linear SVM classifiers , 2009, KDD.

[6] Kathleen McKeown,et al. Aggregated Word Pair Features for Implicit Discourse Relation Disambiguation , 2013, ACL.

[7] Livio Robaldo,et al. Sense Annotation in the Penn Discourse Treebank , 2008, CICLing.

[8] Hwee Tou Ng,et al. Recognizing Implicit Discourse Relations in the Penn Discourse Treebank , 2009, EMNLP.

[9] Katharina Morik,et al. Combining Statistical Learning with a Knowledge-Based Approach - A Case Study in Intensive Care Monitoring , 1999, ICML.

[10] Ani Nenkova,et al. Easily Identifiable Discourse Relations , 2008, COLING.

[11] Nitesh V. Chawla,et al. SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..