Finding document topics for improving topic segmentation

Topic segmentation and identification are often tackled as separate problems whereas they are both part of topic analysis. In this article, we study how topic identification can help to improve a topic segmenter based on word reiteration. We first present an unsupervised method for discovering the topics of a text. Then, we detail how these topics are used by segmentation for finding topical similarities between text segments. Finally, we show through the results of an evaluation done both for French and English the interest of the method we propose.

[1]  Massih-Reza Amini,et al.  Unsupervised Learning with Term Clustering for Thematic Segmentation of Texts , 2004, RIAO.

[2]  Rebecca J. Passonneau,et al.  Discourse Segmentation by Human and Automated Means , 1997, CL.

[3]  Alexander Clark,et al.  An Analysis of Quantitative Aspects in the Evaluation of Thematic Segmentation Algorithms , 2009, SIGDIAL Workshop.

[4]  Panos Constantopoulos,et al.  Research and Advanced Technology for Digital Libraries , 2001, Lecture Notes in Computer Science.

[5]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[6]  Vipin Kumar,et al.  Finding Topics in Collections of Documents: A Shared Nearest Neighbor Approach , 2003, Clustering and Information Retrieval.

[7]  David M. Blei,et al.  Topic segmentation with an aspect hidden Markov model , 2001, SIGIR '01.

[8]  Alan F. Smeaton,et al.  Segmenting broadcast news streams using lexical chains , 2002 .

[9]  W. Bruce Croft,et al.  Text Segmentation by Topic , 1997, ECDL.

[10]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[11]  Freddy Y. Y. Choi Advances in domain independent linear text segmentation , 2000, ANLP.

[12]  Marti A. Hearst Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[13]  Lindsay J. Evett,et al.  Text Segmentation Using Reiteration and Collocation , 1998, COLING-ACL.

[14]  Thomas L. Griffiths,et al.  Unsupervised Topic Modelling for Multi-Party Spoken Discourse , 2006, ACL.

[15]  Marti A. Hearst,et al.  A Critique and Improvement of an Evaluation Metric for Text Segmentation , 2002, CL.

[16]  Hitoshi Isahara,et al.  A Statistical Model for Domain-Independent Text Segmentation , 2001, ACL.

[17]  Johanna D. Moore,et al.  Latent Semantic Analysis for Text Segmentation , 2001, EMNLP.

[18]  Larry Gillick,et al.  A hidden Markov model approach to text segmentation and event tracking , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[19]  Hideki Kozima,et al.  Text Segmentation Based on Similarity between Words , 1993, ACL.

[20]  Eric Fosler-Lussier,et al.  Discourse Segmentation of Multi-Party Conversation , 2003, ACL.