Unsupervised Topic Modelling for Multi-Party Spoken Discourse

We present a method for unsupervised topic modelling which adapts methods used in document classification (Blei et al., 2003; Griffiths and Steyvers, 2004) to unsegmented multi-party discourse transcripts. We show how Bayesian inference in this generative model can be used to simultaneously address the problems of topic segmentation and topic identification: automatically segmenting multi-party meetings into topically coherent segments with performance which compares well with previous unsupervised segmentation-only methods (Galley et al., 2003) while simultaneously extracting topics which rate highly when assessed for coherence by human judges. We also show that this method appears robust in the face of off-topic dialogue and speech recognition errors.

[1]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[2]  Andrei Popescu-Belis,et al.  User Query Analysis for the Specification and Evaluation of a Dialogue Processing and Retrieval System , 2004, LREC.

[3]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[4]  Jeffrey C. Reynar Statistical Models for Topic Segmentation , 1999, ACL.

[5]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[7]  Gerhard Rigoll,et al.  Segmentation and classification of meeting events using multiple classifier fusion and dynamic programming , 2004, ICPR 2004.

[8]  Julia Hirschberg,et al.  Automatic summarization of broadcast news using structural features , 2003, INTERSPEECH.

[9]  Regina Barzilay,et al.  Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization , 2004, NAACL.

[10]  Marti A. Hearst Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[11]  Frank Keller,et al.  Using the Web to Obtain Frequencies for Unseen Bigrams , 2003, CL.

[12]  Scott Bennett,et al.  Evaluating Automated and Manual Acquisition of Anaphora Resolution Strategies , 1995, ACL.

[13]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[14]  Ralph Grishman,et al.  Extracting Relations with Integrated Information Using Kernel Methods , 2005, ACL.

[15]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[16]  Jerry R. Hobbs Resolving pronoun references , 1986 .

[17]  Richard M. Schwartz,et al.  Improved topic discrimination of broadcast news using a model of multiple simultaneous topics , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Marti A. Hearst,et al.  A Critique and Improvement of an Evaluation Metric for Text Segmentation , 2002, CL.

[19]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[20]  Xiaoqiang Luo,et al.  Multi-Lingual Coreference Resolution With Syntactic Features , 2005, HLT/EMNLP.

[21]  Alexander I. Rudnicky,et al.  Using simple speech-based features to detect the state of a meeting and the roles of the meeting participants , 2004, INTERSPEECH.

[22]  Shalom Lappin,et al.  An Algorithm for Pronominal Anaphora Resolution , 1994, CL.

[23]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[24]  Steve Renals,et al.  Dynamic Bayesian networks for meeting structuring , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[26]  Jian Su,et al.  Improving Pronoun Resolution by Incorporating Coreferential Information of Candidates , 2004, ACL.

[27]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[28]  Claire Gardent,et al.  Improving Machine Learning Approaches to Coreference Resolution , 2002, ACL.

[29]  Branimir Boguraev,et al.  Anaphora for Everyone: Pronominal Anaphora Resolution without a Parser , 1996, COLING.

[30]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[31]  Ruslan Mitkov,et al.  Robust Pronoun Resolution with Limited Knowledge , 1998, ACL.

[32]  Eric Fosler-Lussier,et al.  Discourse Segmentation of Multi-Party Conversation , 2003, ACL.

[33]  Dmitry Zelenko,et al.  Kernel Methods for Relation Extraction , 2002, J. Mach. Learn. Res..

[34]  David M. Blei,et al.  Topic segmentation with an aspect hidden Markov model , 2001, SIGIR '01.

[35]  Carolyn Penstein Rosé,et al.  The Necessity of a Meeting Recording and Playback System, and the Benefit of Topic-Level Annotations to Meeting Browsing , 2005, INTERACT.

[36]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .