Content Modeling Using Latent Permutations

We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selection and ordering are biased to be similar across a collection of related documents. We show that this space of orderings can be effectively represented using a distribution over permutations called the Generalized Mallows Model. We apply our method to three complementary discourse-level tasks: cross-document alignment, document segmentation, and information ordering. Our experiments show that incorporating our permutation-based model in these applications yields substantial improvements in performance over previously proposed methods.

[1]  Hitoshi Isahara,et al.  A Statistical Model for Domain-Independent Text Segmentation , 2001, ACL.

[2]  F. Bartlett,et al.  Remembering: A Study in Experimental and Social Psychology , 1932 .

[3]  Mirella Lapata,et al.  Probabilistic Text Structuring: Experiments with Sentence Ordering , 2003, ACL.

[4]  B. Webber The Handbook of Discourse Analysis , 2005 .

[5]  M. Fligner,et al.  Distance Based Ranking Models , 1986 .

[6]  M. Fligner,et al.  Posterior probabilities for a consensus ordering , 1990 .

[7]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies , 2000, ArXiv.

[8]  S. T. Buckland,et al.  Computer-Intensive Methods for Testing Hypotheses. , 1990 .

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Regina Barzilay,et al.  Sentence Alignment for Monolingual Comparable Corpora , 2003, EMNLP.

[11]  Larry Gillick,et al.  Text segmentation and topic tracking on broadcast news via a hidden Markov model approach , 1998, ICSLP.

[12]  Nir Ailon,et al.  Aggregating inconsistent information: Ranking and clustering , 2008 .

[13]  Regina Barzilay,et al.  Information Fusion in the Context of Multi-Document Summarization , 1999, ACL.

[14]  Stuart M. Shieber,et al.  Towards Robust Context-Sensitive Sentence Alignment for Monolingual Corpora , 2006, EACL.

[15]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[16]  Max Welling,et al.  Fast collapsed gibbs sampling for latent dirichlet allocation , 2008, KDD.

[17]  A. Graesser,et al.  Handbook of discourse processes , 2003 .

[18]  C. L. Mallows NON-NULL RANKING MODELS. I , 1957 .

[19]  Igor Malioutov,et al.  Minimum Cut Model for Spoken Lecture Segmentation , 2006, ACL.

[20]  Stefan Riezler,et al.  On Some Pitfalls in Automatic Evaluation and Significance Testing for MT , 2005, IEEvaluation@ACL.

[21]  Ernst Althaus,et al.  Computing Locally Coherent Discourses , 2004, ACL.

[22]  Regina Barzilay,et al.  Inferring Strategies for Sentence Ordering in Multidocument News Summarization , 2002, J. Artif. Intell. Res..

[23]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[24]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[25]  Marti A. Hearst,et al.  A Critique and Improvement of an Evaluation Metric for Text Segmentation , 2002, CL.

[26]  Michal Rosen-Zvi,et al.  Hidden Topic Markov Models , 2007, AISTATS.

[27]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[28]  Scott M. Smith,et al.  Computer Intensive Methods for Testing Hypotheses: An Introduction , 1989 .

[29]  Mirella Lapata,et al.  Automatic Evaluation of Information Ordering: Kendall’s Tau , 2006, CL.

[30]  Dan Roth,et al.  Unsupervised rank aggregation with distance-based models , 2008, ICML '08.

[31]  Eric Fosler-Lussier,et al.  Discourse Segmentation of Multi-Party Conversation , 2003, ACL.

[32]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Danushka Bollegala,et al.  A Bottom-Up Approach to Sentence Ordering for Multi-Document Summarization , 2006, ACL.

[34]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[35]  Stephen G. Pulman,et al.  Sentence ordering with manifold-based classification in multi-document summarization , 2006, EMNLP.

[36]  David R. Karger,et al.  Global Models of Document Structure using Latent Permutations , 2009, NAACL.

[37]  Micha Elsner,et al.  A Unified Local and Global Model for Discourse Coherence , 2007, NAACL.

[38]  Chris Mellish,et al.  Evaluating Centering-Based Metrics of Coherence , 2004, ACL.

[39]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[40]  John D. Lafferty,et al.  Cranking: Combining Rankings Using Conditional Probability Models on Permutations , 2002, ICML.

[41]  Alison Wray,et al.  Formulaic Language and the Lexicon: List of Figures and Tables , 2002 .

[42]  Regina Barzilay,et al.  Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization , 2004, NAACL.

[43]  Ivan Titov,et al.  Modeling online reviews with multi-grain topic models , 2008, WWW.

[44]  Jeff A. Bilmes,et al.  Consensus ranking under the exponential model , 2007, UAI.

[45]  David M. Blei,et al.  Topic segmentation with an aspect hidden Markov model , 2001, SIGIR '01.

[46]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[47]  Michael Halliday,et al.  Cohesion in English , 1976 .

[48]  Regina Barzilay,et al.  Bayesian Unsupervised Topic Segmentation , 2008, EMNLP.

[49]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[50]  Thomas L. Griffiths,et al.  Unsupervised Topic Modelling for Multi-Party Spoken Discourse , 2006, ACL.

[51]  Chris Mellish,et al.  A Corpus-Based Methodology for Evaluating Metrics of Coherence for Text Structuring , 2004, INLG.

[52]  Marti A. Hearst Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[53]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[54]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[55]  Nancy Chinchor,et al.  Statistical Significance of MUC-6 Results , 1995, MUC.

[56]  Jean Carletta,et al.  Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization , 2005, ACL 2005.