Text segmentation of spoken meeting transcripts

Text segmentation has played an important role in information retrieval as well as natural language processing. Current segmentation methods are well suited for written and structured texts making use of their distinctive macro-level structures; however text segmentation of transcribed multi-party conversation presents a different challenge given its ill-formed sentences and the lack of macro-level text units. This paper describes an algorithm suitable for segmenting spoken meeting transcripts combining semantically complex lexical relations with speech cue phrases to build lexical chains in determining topic boundaries.

[1]  Rebecca J. Passonneau,et al.  Discourse Segmentation by Human and Automated Means , 1997, CL.

[2]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[3]  Tatsuya Kawahara,et al.  Automatic transcription of spontaneous lecture speech , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[4]  Julia Hirschberg,et al.  Empirical Studies on the Disambiguation of Cue Phrases , 1993, Comput. Linguistics.

[5]  Gina-Anne Levow,et al.  Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue , 2004, SIGDIAL Workshop.

[6]  Barry Boehm,et al.  Top 10 list [software development] , 2001 .

[7]  Yoshua Bengio,et al.  Topic Segmentation : A First Stage to Dialog-Based Information Extraction , 2001, NLPRS.

[8]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[9]  Nicola Stokes,et al.  Spoken and Written News Story Segmentation Using Lexical Chains , 2003, NAACL.

[10]  Michael Halliday,et al.  Cohesion in English , 1976 .

[11]  Ronnie W. Smith,et al.  Current and New Directions in Discourse and Dialogue , 2004 .

[12]  Mitchell P. Marcus,et al.  Topic segmentation: algorithms and applications , 1998 .

[13]  G. Youmans A New Tool for Discourse Analysis: The Vocabulary-Management Profile. , 1991 .

[14]  K. Yamada,et al.  A maximum-likelihood approach to segmentation-based recognition of unconstrained handwriting text , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[15]  Larry Gillick,et al.  A hidden Markov model approach to text segmentation and event tracking , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[16]  Cécile Paris,et al.  Segmenting Email Message Text into Zones , 2009, EMNLP.

[17]  Bilan Zhu,et al.  Segmentation of on-line handwritten Japanese text of arbitrary line direction by a neural network for improving text recognition , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[18]  Eric Fosler-Lussier,et al.  Discourse Segmentation of Multi-Party Conversation , 2003, ACL.

[19]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[20]  Barry W. Boehm,et al.  Software Defect Reduction Top 10 List , 2001, Computer.

[21]  Jacob Eisenstein,et al.  Hierarchical Text Segmentation from Multi-Scale Lexical Cohesion , 2009, NAACL.

[22]  Carolyn Penstein Rosé,et al.  Topic-Segmentation of Dialogue , 2006, HLT-NAACL 2006.

[23]  Graeme Hirst,et al.  Does Conversation Analysis Have a Role in Computational Linguistics? , 1991, CL.

[24]  Bhuvana Ramabhadran,et al.  Building an information retrieval test collection for spontaneous conversational speech , 2004, SIGIR '04.

[25]  Larry Gillick,et al.  Text segmentation and topic tracking on broadcast news via a hidden Markov model approach , 1998, ICSLP.

[26]  I. R. MacKay David Crystal. A Dictionary of Linguistics and Phonetics. 2nd ed. London: Blackwell. 1985. , 1987, Canadian Journal of Linguistics/Revue canadienne de linguistique.

[27]  Bernadette Sharp Elaboration and testing of new methodologies for automatic abstracting , 1989 .

[28]  John D. Lafferty,et al.  Text Segmentation Using Exponential Models , 1997, EMNLP.

[29]  Johanna D. Moore,et al.  Latent Semantic Analysis for Text Segmentation , 2001, EMNLP.

[30]  Jeffrey C. Reynar Statistical Models for Topic Segmentation , 1999, ACL.

[31]  Joyce Chai,et al.  Discourse Structure for Context Question Answering , 2004, HLT-NAACL 2004.

[32]  Matthew Purver,et al.  Meeting Structure Annotation: Data and Tools , 2005, SIGDIAL.

[33]  Susan Gauch,et al.  ChatTrack: Chat Room Topic Detection Using Classification , 2004, ISI.

[34]  Nicola Stokes,et al.  Applications of Lexical Cohesion Analysis in the Topic Detection and Tracking Domain , 2004 .

[35]  Fan Yang,et al.  Reconciling Control and Discourse Structure , 2003 .

[36]  David Crystal,et al.  A dictionary of linguistics and phonetics , 1997 .

[37]  Marti A. Hearst Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[38]  Yuen-Hsien Tseng,et al.  Text mining techniques for patent analysis , 2007, Inf. Process. Manag..

[39]  Marti A. Hearst,et al.  A Critique and Improvement of an Evaluation Metric for Text Segmentation , 2002, CL.

[40]  Paul Edward Rayson,et al.  Matrix : a statistical method and software tool for linguistic analysis through corpus comparison , 2003 .