Fixed-Length Poisson MRF: Adding Dependencies to the Multinomial

We propose a novel distribution that generalizes the Multinomial distribution to enable dependencies between dimensions. Our novel distribution is based on the parametric form of the Poisson MRF model [1] but is fundamentally different because of the domain restriction to a fixed-length vector like in a Multinomial where the number of trials is fixed or known. Thus, we propose the Fixed-Length Poisson MRF (LPMRF) distribution. We develop AIS sampling methods to estimate the likelihood and log partition function (i.e. the log normalizing constant), which was not developed for the Poisson MRF model. In addition, we propose novel mixture and topic models that use LPMRF as a base distribution and discuss the similarities and differences with previous topic models such as the recently proposed Admixture of Poisson MRFs [2]. We show the effectiveness of our LPMRF distribution over Multinomial models by evaluating the test set perplexity on a dataset of abstracts and Wikipedia. Qualitatively, we show that the positive dependencies discovered by LPMRF are interesting and intuitive. Finally, we show that our algorithms are fast and have good scaling (code available online).

[1]  P. Altham,et al.  Two Generalizations of the Binomial Distribution , 1978 .

[2]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[3]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[6]  William W. Cohen,et al.  Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[7]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[8]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[9]  Timothy Baldwin,et al.  Evaluating topic models for digital libraries , 2010, JCDL '10.

[10]  David M. Blei,et al.  Bayesian Checking for Topic Models , 2011, EMNLP.

[11]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[12]  Pradeep Ravikumar,et al.  Graphical Models via Generalized Linear Models , 2012, NIPS.

[13]  Pradeep Ravikumar,et al.  On Poisson Graphical Models , 2013, NIPS.

[14]  Mark Stevenson,et al.  Evaluating Topic Coherence Using Distributional Semantics , 2013, IWCS.

[15]  Pradeep Ravikumar,et al.  Capturing Semantically Meaningful Word Dependencies with an Admixture of Poisson MRFs , 2014, NIPS.

[16]  Pradeep Ravikumar,et al.  Admixture of Poisson MRFs: A Topic Model with Word Dependencies , 2014, ICML.