Discursive Usage of Six Chinese Punctuation Marks

Both rhetorical structure and punctuation have been helpful in discourse processing. Based on a corpus annotation project, this paper reports the discursive usage of 6 Chinese punctuation marks in news commentary texts: Colon, Dash, Ellipsis, Exclamation Mark, Question Mark, and Semicolon. The rhetorical patterns of these marks are compared against patterns around cue phrases in general. Results show that these Chinese punctuation marks, though fewer in number than cue phrases, are easy to identify, have strong correlation with certain relations, and can be used as distinctive indicators of nuclearity in Chinese texts.

[1]  Weijun Gao,et al.  Applying machine learning to identify Chinese discourse markers , 1999, Proceedings 1999 International Conference on Information Intelligence and Systems (Cat. No.PR00446).

[2]  S. Corston-Oliver,et al.  Computing representations of the structure of written discourse , 1998 .

[3]  Maosong Sun,et al.  Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data , 2022, International Conference on Computational Linguistics.

[4]  Manfred Stede,et al.  The Potsdam Commentary Corpus , 2004, ACL 2004.

[5]  Robert Dale The Role of Punctuation in Discourse Structure , 2002 .

[6]  Andi Wu Chinese Word Segmentation in MSR-NLP , 2003, SIGHAN.

[7]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[8]  Maki Watanabe,et al.  Discourse Tagging Reference Manual , 2001 .

[9]  Chunyu Kit,et al.  Tokenization as the Initial Phase in NLP , 1992, COLING.

[10]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[11]  David Reitter,et al.  Simple Signals for Complex Rhetorics: On Rhetorical Analysis with Rich-Feature Support Vector Models , 2003, LDV Forum.

[12]  Zhang Yi A Hybrid Method for Automatic Chinese Discourse Structure Analysis , 2000 .

[13]  Bilge Say,et al.  An Information-Based Approach to Punctuation , 1997, AAAI/IAAI.

[14]  Thomas C. Chuang,et al.  Aligning Parallel Bilingual Corpora Statistically with Punctuation Criteria , 2005, Int. J. Comput. Linguistics Chin. Lang. Process..

[15]  Jin Guo,et al.  Critical Tokenization and its Properties , 1997, Comput. Linguistics.

[16]  Daniel Marcu,et al.  The rhetorical parsing, summarization, and generation of natural language texts , 1998 .

[17]  Ron Scollon,et al.  Point of view and citation: Fourteen Chinese and English versions of the ‛same’ news story , 1997 .

[18]  Maosong Sun,et al.  Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data , 1998, ACL.

[19]  Benjamin K. Tsou,et al.  Mining Discourse Markers for Chinese Textual Summarization , 2000 .