Advances in domain independent linear text segmentation

This paper describes a method for linear text segmentation which is twice as accurate and over seven times as fast as the state-of-the-art (Reynar, 1998). Inter-sentence similarity is replaced by rank in the local context. Boundary locations are discovered by divisive clustering.

[1]  Seiji Miike,et al.  A full-text retrieval system with a dynamic abstract generation function , 1994, SIGIR '94.

[2]  Stefan Kaufmann Cohesion and Collocation: Using Context Vectors in Text Segmentation , 1999, ACL.

[3]  Makoto Nagao,et al.  Automatic Detection of Discourse Structure by Checking Surface Information in Sentences , 1994, COLING.

[4]  Hideki Kozima,et al.  Text Segmentation Based on Similarity between Words , 1993, ACL.

[5]  Breck Baldwin,et al.  EAGLE: An Extensible Architecture for General Linguistic Engineering , 1997, ANLP.

[6]  John D. Lafferty,et al.  Text Segmentation Using Exponential Models , 1997, EMNLP.

[7]  Filippo Menczer,et al.  A cluster-based approach to tracking, detection and segmentation of broadcast news , 1999 .

[8]  Oskari Heinonen,et al.  Optimal Multi-Paragraph Text Segmentation by Dynamic Programming , 1998, ACL.

[9]  John D. Lafferty,et al.  A Model of Lexical Attraction and Repulsion , 1997, ACL.

[10]  Yaakov Yaari,et al.  Segmentation of Expository Texts by Hierarchical Agglomerative Clustering , 1997, ArXiv.

[11]  William H. Press,et al.  Numerical recipes in C , 2002 .

[12]  G. Youmans A New Tool for Discourse Analysis: The Vocabulary-Management Profile. , 1991 .

[13]  W. Bruce Croft,et al.  Text Segmentation by Topic , 1997, ECDL.

[14]  Pasi Tapanainen,et al.  What is a word, What is a sentence? Problems of Tokenization , 1994 .

[15]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[16]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[17]  Min-Yen Kan,et al.  Linear Segmentation and Segment Significance , 1998, VLC@COLING/ACL.

[18]  Jeffrey C. Reynar Statistical Models for Topic Segmentation , 1999, ACL.

[19]  Christian Plaunt,et al.  Subtopic structuring for full-length document access , 1993, SIGIR.

[20]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[21]  Michael Halliday,et al.  Cohesion in English , 1976 .

[22]  Jonathan Helfman,et al.  Dotplot Patterns: A Literal Look at Pattern Languages , 1996, Theory Pract. Object Syst..

[23]  F.Y.Y. Choi A speech interface for rapid reading , 2000 .

[24]  Mitchell P. Marcus,et al.  Topic segmentation: algorithms and applications , 1998 .

[25]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[26]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Approach to Identifying Sentence Boundaries , 1997, ANLP.

[27]  Manabu Okumura,et al.  Text Segmentation with Multiple Surface Linguistic Cues , 1998, COLING-ACL.

[28]  Marti A. Hearst,et al.  Adaptive Sentence Boundary Disambiguation , 1994, ANLP.

[29]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[30]  Jeffrey C. Reynar An Automatic Method of Finding Topic Boundaries , 1994, ACL.

[31]  Kenneth Ward Church,et al.  Dotplot : a program for exploring self-similarity in millions of lines of text and code , 1993 .

[32]  Kenneth Ward Church Char_align: A Program for Aligning Parallel Texts at the Character Level , 1993, ACL.

[33]  Marti A. Hearst Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[34]  Rebecca J. Passonneau,et al.  Combining Multiple Knowledge Sources for Discourse Segmentation , 1995, ACL.

[35]  Mark A. O'Neill,et al.  Practical approach to the stereo matching of urban imagery , 1992, Image Vis. Comput..

[36]  Manabu Okumura,et al.  Text Segmentation with Multiple Surface Linguistic Cues , 1999, COLING.