SITS: A Hierarchical Nonparametric Model using Speaker Identity for Topic Segmentation in Multiparty Conversations

One of the key tasks for analyzing conversational data is segmenting it into coherent topic segments. However, most models of topic segmentation ignore the social aspect of conversations, focusing only on the words used. We introduce a hierarchical Bayesian nonparametric model, Speaker Identity for Topic Segmentation (SITS), that discovers (1) the topics used in a conversation, (2) how these topics are shared across conversations, (3) when these topics shift, and (4) a person-specific tendency to introduce new topics. We evaluate against current unsupervised segmentation models to show that including person-specific information improves segmentation performance on meeting corpora and on political debates. Moreover, we provide evidence that SITS captures an individual's tendency to introduce new topics in political contexts, via analysis of the 2008 US presidential debates and the television program Crossfire.

[1]  Michael I. Jordan,et al.  An HDP-HMM for systems with state persistence , 2008, ICML '08.

[2]  Michael J. Paul,et al.  A Two-Dimensional Topic-Aspect Model for Discovering Multi-Faceted Topics , 2010, AAAI.

[3]  Mitchell P. Marcus,et al.  Topic segmentation: algorithms and applications , 1998 .

[4]  Eric Fosler-Lussier,et al.  Discourse Segmentation of Multi-Party Conversation , 2003, ACL.

[5]  Hanna Wallach,et al.  Structured Topic Models for Language , 2008 .

[6]  Philip Resnik,et al.  GIBBS SAMPLING FOR THE UNINITIATED , 2010 .

[7]  Jean Carletta,et al.  Extractive summarization of meeting recordings , 2005, INTERSPEECH.

[8]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[9]  Michael Werman,et al.  Fast and robust Earth Mover's Distances , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Alexander Clark,et al.  Word Distributions for Thematic Segmentation in a Support Vector Machine Approach , 2006, CoNLL.

[11]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[12]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[13]  Johanna D. Moore,et al.  Automatic Segmentation of Multiparty Dialogue , 2006, EACL.

[14]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[15]  Chong Wang,et al.  Continuous Time Dynamic Topic Models , 2008, UAI.

[16]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[17]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[18]  Jimmy J. Lin,et al.  Elements of a computational model for multi-party discourse: The turn-taking behavior of Supreme Court justices , 2009, J. Assoc. Inf. Sci. Technol..

[19]  Matt Thomas,et al.  Get out the vote: Determining support or opposition from Congressional floor-debate transcripts , 2006, EMNLP.

[20]  Mark Johnson,et al.  PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names , 2010, ACL.

[21]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[22]  Eric P. Xing,et al.  Dynamic Non-Parametric Mixture Models and the Recurrent Chinese Restaurant Process: with Applications to Evolutionary Clustering , 2008, SDM.

[23]  Michal Rosen-Zvi,et al.  Hidden Topic Markov Models , 2007, AISTATS.

[24]  Lauren E. Scissors,et al.  Language Style Matching Predicts Relationship Initiation and Stability , 2011, Psychological science.

[25]  Justin Grimmer,et al.  A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases , 2010, Political Analysis.

[26]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[27]  David M. Blei,et al.  Syntactic Topic Models , 2008, NIPS.

[28]  M. Walker,et al.  How can you say such things?!?: Recognizing Disagreement in Informal Political Argument , 2011 .

[29]  Gökhan Tür,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 1 The CALO Meeting Assistant System , 2022 .

[30]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[31]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[32]  Regina Barzilay,et al.  Bayesian Unsupervised Topic Segmentation , 2008, EMNLP.

[33]  Thomas L. Griffiths,et al.  Unsupervised Topic Modelling for Multi-Party Spoken Discourse , 2006, ACL.

[34]  Matthew Purver,et al.  A Meeting Browser that Learns , 2007, Interaction Challenges for Intelligent Assistants.

[35]  Fernando A. Quintana,et al.  Nonparametric Bayesian data analysis , 2004 .

[36]  Sean Gerrish,et al.  A Language-based Approach to Measuring Scholarly Impact , 2010, ICML.

[37]  Samuel J. Gershman,et al.  A Tutorial on Bayesian Nonparametric Models , 2011, 1106.2697.

[38]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[39]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[40]  Huidong Jin,et al.  Sequential Latent Dirichlet Allocation: Discover Underlying Topic Structures within a Document , 2010, 2010 IEEE International Conference on Data Mining.

[41]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[42]  Philip Resnik,et al.  Elements of a computational model for multi-party discourse: The turn-taking behavior of Supreme Court justices , 2009 .

[43]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[44]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[45]  Marti A. Hearst,et al.  A Critique and Improvement of an Evaluation Metric for Text Segmentation , 2002, CL.

[46]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[47]  Andrew Olney,et al.  An Orthonormal Basis for Topic Segmentation in Tutorial Dialogue , 2005, HLT.