论文信息 - Combining Labeled and Unlabeled Data for Learning Cross-Document Structural Relationships

Combining Labeled and Unlabeled Data for Learning Cross-Document Structural Relationships

Multi-document discourse analysis has emerged with the potential of improving various NLP applications. Based on the newly proposed Cross-document Structure Theory (CST), this paper describes an empirical study that classifies CST relationships between sentence pairs extracted from topically related documents, exploiting both labeled and unlabeled data. We investigate a binary classifier for determining existence of structural relationships and a full classifier using the full taxonomy of relationships. We show that in both cases the exploitation of unlabeled data helps improve the performance of learned classifiers.

Dragomir R. Radev | Zhu Zhang

[1] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[2] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[3] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[4] Zhu Zhang,et al. Towards CST-enhanced summarization , 2002, AAAI/IAAI.

[5] Michael Collins,et al. Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[6] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[7] Steven P. Abney,et al. Bootstrapping , 2002, ACL.

[8] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[9] Avrim Blum,et al. The Bottleneck , 2021, Monopsony Capitalism.

[10] Michele Banko,et al. Scaling to Very Very Large Corpora for Natural Language Disambiguation , 2001, ACL.

[11] Daniel Marcu,et al. An Unsupervised Approach to Recognizing Discourse Relations , 2002, ACL.