论文信息 - A Grammatico-Statistical Approach to Discourse Partitioning

A Grammatico-Statistical Approach to Discourse Partitioning

The paper presents a new approach to text segmentation - which concerns dividing a text into coherent discourse units. The approach builds on the theory of discourse segment (Nomoto and Nitta, 1993), incorporating ideas from the research on information retrieval (Salton, 1988). A discourse segment has to do with a structure of Japanese discourse; it could be thought of as a linguistic unit demarcated by wa, a Japanese topic particle, which may extend over several sentences. The segmentation works with discourse segments and makes use of coherence measure based on tf-idf, a standard information retrieval measurement (Salton, 1988; Hearst, 1993). Experiments have been done with a Japanese newspaper corpus. It has been found that the present approach is quite successful in recovering articles from the unstructured corpus.

Tadashi Nomoto | Yoshihiko Nitta | Tadashi Nomoto | Y. Nitta

[1] G. Youmans. A New Tool for Discourse Analysis: The Vocabulary-Management Profile. , 1991 .

[2] William C. Mann,et al. Rhetorical Structure Theory: A Framework for the Analysis of Texts , 1987 .

[3] Hideki Kozima,et al. Text Segmentation Based on Similarity between Words , 1993, ACL.

[4] Tadashi Nomoto,et al. Resolving Zero Anaphora in Japanese , 1993, EACL.

[5] Geoffrey Nunberg,et al. The linguistics of punctuation , 1990 .

[6] Marti A. Hearst. TextTiling: A Quantitative Approach to Discourse , 1993 .

[7] Candace L. Sidner,et al. Attention, Intentions, and the Structure of Discourse , 1986, CL.

[8] Takenobu Tokunaga,et al. Text Categorization based on Weighted Inverse Document Frequency , 1994 .

[9] Gerard Salton,et al. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[10] Rebecca J. Passonneau,et al. Intention-Based Segmentation: Human Reliability and Correlation with Linguistic Cues , 1993, ACL.

[11] Eduard H. Hovy. Parsimonious and Profligate Approaches to the Question of Discourse Structure Relations , 1990, INLG.

[12] Gerald Salton,et al. Automatic text processing , 1988 .