Enhancement of a Chinese Discourse Marker Tagger with C4.5

Discourse markers are complex discontinuous linguistic expressions which are used to explicitly signal the discourse structure of a text. This paper describes efforts to improve an automatic tagging system which identifies and classifies discourse markers in Chinese texts by applying machine learning (ML) to the disambiguation of discourse markers, as an integral part of automatic text summarization via rhetorical structure. Encouraging results are reported.