Topic Segmentation Algorithms for Text Summarization and Passage Retrieval: An Exhaustive Evaluation

In order to solve problems of reliability of systems based on lexical repetition and problems of adaptability of language-dependent systems, we present a context-based topic segmentation system based on a new informative similarity measure based on word co-occurrence. In particular, our evaluation with the state-of-the-art in the domain i.e. the c99 and the TextTiling algorithms shows improved results both with and without the identification of multiword units.

[1]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[2]  John D. Lafferty,et al.  Text Segmentation Using Exponential Models , 1997, EMNLP.

[3]  W. Bruce Croft,et al.  Text Segmentation by Topic , 1997, ECDL.

[4]  Marti A. Hearst,et al.  A Critique and Improvement of an Evaluation Metric for Text Segmentation , 2002, CL.

[5]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[6]  Freddy Y. Y. Choi Advances in domain independent linear text segmentation , 2000, ANLP.

[7]  Marti A. Hearst Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[8]  Charles L. A. Clarke,et al.  Fast Automatic Passage Ranking (MultiText Experiments for TREC-8) , 1999, TREC.

[9]  Branimir Boguraev,et al.  Discourse segmentation in aid of document summarization , 2000, Proceedings of the 33rd Annual Hawaii International Conference on System Sciences.

[10]  Hideki Kozima,et al.  Text Segmentation Based on Similarity between Words , 1993, ACL.

[11]  Wei-Ying Ma,et al.  Improving pseudo-relevance feedback in web information retrieval using web page segmentation , 2003, WWW '03.

[12]  José Gabriel Pereira Lopes,et al.  Language Independent Automatic Acquisition of Rigid Multiword Units from Unrestricted Text Corpora , 1999 .

[13]  Jeffrey C. Reynar An Automatic Method of Finding Topic Boundaries , 1994, ACL.

[14]  Marie-Francine Moens,et al.  Generic topic segmentation of document texts , 2001, SIGIR '01.

[15]  Olivier Ferret,et al.  Using Collocations for Topic Segmentation and Link Detection , 2002, COLING.

[16]  Y. Toussaint,et al.  Acquisition et structuration des connaissances en corpus : éléments méthodologiques , 1997 .

[17]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[18]  Xiang Ji,et al.  Domain-independent text segmentation using anisotropic diffusion and dynamic programming , 2003, SIGIR.

[19]  Clement T. Yu,et al.  A theory of term importance in automatic text analysis , 1974, J. Am. Soc. Inf. Sci..

[20]  Alan F. Smeaton,et al.  Segmenting broadcast news streams using lexical chains , 2002 .

[21]  Guy Lapalme,et al.  Legal Text Summarization by Exploration of the Thematic Structure and Argumentative Roles , 2004 .