Using Collocations for Topic Segmentation and Link Detection

We present in this paper a method for achieving in an integrated way two tasks of topic analysis: segmentation and link detection. This method combines word repetition and the lexical cohesion stated by a collocation network to compensate for the respective weaknesses of the two approaches. We report an evaluation of our method for segmentation on two corpora, one in French and one in English, and we propose an evaluation measure that specifically suits that kind of systems.

[1]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[2]  Hitoshi Isahara,et al.  A Statistical Model for Domain-Independent Text Segmentation , 2001, ACL.

[3]  Johanna D. Moore,et al.  Latent Semantic Analysis for Text Segmentation , 2001, EMNLP.

[4]  Brigitte Grau,et al.  A Topic Segmentation of Texts based on Semantic Domains , 2000, ECAI.

[5]  Freddy Y. Y. Choi Advances in domain independent linear text segmentation , 2000, ANLP.

[6]  Jonathan G. Fiscus,et al.  NIST's 1998 topic detection and tracking evaluation (TDT2) , 1999, EUROSPEECH.

[7]  Stefan Kaufmann Cohesion and Collocation: Using Context Vectors in Text Segmentation , 1999, ACL.

[8]  Marc El-Bèze,et al.  Detecting topic shifts using a cache memory , 1998, ICSLP.

[9]  Min-Yen Kan,et al.  Linear Segmentation and Segment Significance , 1998, VLC@COLING/ACL.

[10]  Lindsay J. Evett,et al.  Text Segmentation Using Reiteration and Collocation , 1998, COLING-ACL.

[11]  Mitchell P. Marcus,et al.  Topic segmentation: algorithms and applications , 1998 .

[12]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[13]  Rebecca J. Passonneau,et al.  Discourse Segmentation by Human and Automated Means , 1997, CL.

[14]  Hideki Kozima,et al.  Text Segmentation Based on Similarity between Words , 1993, ACL.

[15]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.