Topic Detection Using Lexical Chains

This paper describes an algorithm for identifying the topic of unrestricted texts. The algorithm takes as input segments of text that represent grouping of contiguous portions of the text, and discovers lexical chains as indicator of their topics. Two implementation, based on public domain resources, are presented: one based on WordNet and the second one based on Roget's thesaurus. The evaluation of the algorithm shows that lexical chains are acceptable as topic indicator with 45% of precision and 65% of recall.

[1]  Dutch ROGET'S THESAURUS , 1979 .

[2]  Wanda Pratt,et al.  A Knowledge-Based Approach to Organizing Retrieved Documents , 1999, AAAI/IAAI.

[3]  Graeme Hirst,et al.  Automatically generating hypertext by computing semantic similarity , 1997 .

[4]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[5]  Udo Hahn,et al.  Topic parsing: Accounting for text macro structures in full-text analysis , 1990, Inf. Process. Manag..

[6]  Hideki Kozima,et al.  Text Segmentation Based on Similarity between Words , 1993, ACL.

[7]  Yllias Chali,et al.  Query-Biased Text Summarization as a Question-Answering Technique , 1999 .

[8]  Michael Halliday,et al.  Cohesion in English , 1976 .

[9]  Daniel Marcu The rhetorical parsing of natural language texts , 1997 .

[10]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[11]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .

[12]  Jeffrey C. Reynar Statistical Models for Topic Segmentation , 1999, ACL.

[13]  Daniel Marcu,et al.  A Decision-Based Approach to Rhetorical Parsing , 1999, ACL.

[14]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[15]  Rebecca J. Passonneau,et al.  Combining Multiple Knowledge Sources for Discourse Segmentation , 1995, ACL.

[16]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[17]  Eduard H. Hovy,et al.  The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[18]  Freddy Y. Y. Choi Advances in domain independent linear text segmentation , 2000, ANLP.

[19]  Daniel Marcu,et al.  The rhetorical parsing, summarization, and generation of natural language texts , 1998 .

[20]  Stefan Kaufmann Cohesion and Collocation: Using Context Vectors in Text Segmentation , 1999, ACL.

[21]  Kathleen R. McKeown,et al.  Generating natural language summaries from multiple on-line sources , 1998 .