A Probabilistic Model of Meetings That Combines Words and Discourse Features

In order to determine the points at which meeting discourse changes from one topic to another, probabilistic models were used to approximate the process through which meeting transcripts were produced. Gibbs sampling was used to estimate the values of random variables in the models, including the locations of topic boundaries. This paper shows how discourse features were integrated into the Bayesian model and reports empirical evaluations of the benefit obtained through the inclusion of each feature and of the suitability of alternative models of the placement of topic boundaries. It demonstrates how multiple cues to segmentation can be combined in a principled way, and empirical tests show a clear improvement over previous work.

[1]  Igor Malioutov,et al.  Minimum Cut Model for Spoken Lecture Segmentation , 2006, ACL.

[2]  V. Petridis,et al.  Text Segmentation by Product Partition Models and Dynamic Programming , 2003 .

[3]  Richard M. Schwartz,et al.  Improved topic discrimination of broadcast news using a model of multiple simultaneous topics , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Freddy Y. Y. Choi Advances in domain independent linear text segmentation , 2000, ANLP.

[6]  Julia Hirschberg,et al.  Automatic summarization of broadcast news using structural features , 2003, INTERSPEECH.

[7]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[8]  H. Katzgraber Introduction to Monte Carlo Methods , 2009, 0905.1629.

[9]  Marti A. Hearst Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[10]  Marti A. Hearst,et al.  A Critique and Improvement of an Evaluation Metric for Text Segmentation , 2002, CL.

[11]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[12]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[13]  John R. Anderson,et al.  The Adaptive Nature of Human Categorization , 1991 .

[14]  Min-Yen Kan,et al.  Linear Segmentation and Segment Significance , 1998, VLC@COLING/ACL.

[15]  Bhuvana Ramabhadran,et al.  Automated transcription and topic segmentation of large spoken archives , 2003, INTERSPEECH.

[16]  Eric Fosler-Lussier,et al.  Discourse Segmentation of Multi-Party Conversation , 2003, ACL.

[17]  Qi Tian,et al.  A Two-Level Multi-Modal Approach for Story Segmentation of Large News Video Corpus , 2003, TRECVID.

[18]  Regina Barzilay,et al.  Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization , 2004, NAACL.

[19]  Mark Steyvers,et al.  Topics in semantic representation. , 2007, Psychological review.

[20]  Maria Georgescul,et al.  Exploiting structural meeting-specific features for topic segmentation , 2007, JEPTALNRECITAL.

[21]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Thomas L. Griffiths,et al.  Unsupervised Topic Modelling for Multi-Party Spoken Discourse , 2006, ACL.

[23]  Thomas L. Griffiths,et al.  Contextual Dependencies in Unsupervised Word Segmentation , 2006, ACL.

[24]  Larry Gillick,et al.  Text segmentation and topic tracking on broadcast news via a hidden Markov model approach , 1998, ICSLP.

[25]  Alexander Clark,et al.  Word Distributions for Thematic Segmentation in a Support Vector Machine Approach , 2006, CoNLL.

[26]  Valentin Tablan,et al.  Web-assisted annotation, semantic indexing and search of television and radio news , 2005, WWW '05.