Segmentation of Chinese Discourse in Content-Based Information Retrieval

In this paper, we present a novel approach in automatic discourse segmentation without a full semantic understanding. In order to analyse the textual bonds and determine the degree of coherence that a discourse may exhibit, we first represent the tremendous diversity of textual relations into a discourse network. A set of mutual linguistic constraints that largely determines the similarity of meaning among lexical items is encoded. Topic boundaries in a discourse are identified through a computational method which identifies the segment cluster from a higher order structure in the discourse network. Our segmentation is regarded as a process of identifying the shifts from one segment cluster to another. Experimental results show that our formulation is capable to address the topic shifts of texts. Comparison with a related method demonstrates that the combination of constraints is closely related, to the topic boundaries among textual segments. Evaluation using recall and precision shows the effectiveness of our approach in a collection of Chinese newswire articles.

[1]  Charles R. Fletcher,et al.  Investigations of inferential processes in reading: A theoretical and methodological integration , 1993 .

[2]  James Franklin,et al.  A brain-state-in-a-box network for narrative comprehension and recall , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[3]  Walter Kintsch,et al.  Toward a model of text comprehension and production. , 1978 .

[4]  Hideki Kozima,et al.  Text Segmentation Based on Similarity between Words , 1993, ACL.

[5]  W. Bruce Croft,et al.  Text Segmentation by Topic , 1997, ECDL.

[6]  K. Haberlandt,et al.  Verbs contribute to the coherence of brief narratives: Reading related and unrelated sentence triples , 1978 .

[7]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[8]  G. Youmans A New Tool for Discourse Analysis: The Vocabulary-Management Profile. , 1991 .

[9]  Terry Winograd,et al.  Understanding natural language , 1974 .

[10]  James L. McClelland,et al.  Constituent Attachment and Thematic Role Assignment in Sentence Processing: Influences of Content-Based Expectations , 1988 .

[11]  L. Polanyi A formal model of the structure of discourse , 1988 .

[12]  M. Halliday,et al.  Language, Context, and Text: Aspects of Language in a Social-Semiotic Perspective , 1989 .

[13]  Jerry R. Hobbs Intention, Information, and Structure in Discourse: A First Draft , .

[14]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[15]  Kathleen McKeown,et al.  Text generation: using discourse strategies and focus constraints to generate natural language text , 1985 .

[16]  Hans Brunner,et al.  Empirical Study of Predictive Powers of Simple Attachment Schemes for Post-modifier Prepositional Phrases , 1990, ACL.

[17]  Jan Svartvik,et al.  A __ comprehensive grammar of the English language , 1988 .

[18]  D. Rumelhart NOTES ON A SCHEMA FOR STORIES , 1975 .

[19]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[20]  Simon Kerl A comprehensive grammar of the English language , .

[21]  Gerald Salton,et al.  Automatic text processing , 1988 .

[22]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[23]  Gerard Salton,et al.  Automatic Text Structuring and Summarization , 1997, Inf. Process. Manag..

[24]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[25]  S. E. Stoddard Text and Texture: Patterns of Cohesion , 1991 .

[26]  Yaakov Yaari,et al.  Segmentation of Expository Texts by Hierarchical Agglomerative Clustering , 1997, ArXiv.

[27]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[28]  Michael Halliday,et al.  Cohesion in English , 1976 .

[29]  Rebecca J. Passonneau,et al.  Combining Multiple Knowledge Sources for Discourse Segmentation , 1995, ACL.

[30]  M. Benson,et al.  Collocations and General-purpose Dictionaries , 1990 .