Probabilistic Models for Text Mining

A number of probabilistic methods such as LDA, hidden Markov models, Markov random fields have arisen in recent years for probabilistic analysis of text data. This chapter provides an overview of a variety of probabilistic models for text mining. The chapter focuses more on the fundamental probabilistic techniques, and also covers their various applications to different text mining problems. Some examples of such applications include topic modeling, language modeling, document classification, document clustering, and information extraction.

[1]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[2]  Eric P. Xing,et al.  Dynamic Non-Parametric Mixture Models and the Recurrent Chinese Restaurant Process: with Applications to Evolutionary Clustering , 2008, SDM.

[3]  Thomas L. Griffiths,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[4]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[5]  Carl Tim Kelley,et al.  Iterative methods for optimization , 1999, Frontiers in applied mathematics.

[6]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[7]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[8]  Bei Yu,et al.  A cross-collection mixture model for comparative text mining , 2004, KDD.

[9]  Xu Ling,et al.  Topic sentiment mixture: modeling facets and opinions in weblogs , 2007, WWW '07.

[10]  J. Laurie Snell,et al.  Markov Random Fields and Their Applications , 1980 .

[11]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[12]  Nir Friedman,et al.  Probabilistic Graphical Models , 2009, Data-Driven Computational Neuroscience.

[13]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[15]  Deng Cai,et al.  Topic modeling with network regularization , 2008, WWW.

[16]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[17]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[18]  Dan Roth,et al.  Integer Linear Programming in NLP - Constrained Conditional Models , 2010, NAACL.

[19]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[20]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[21]  Frank Dellaert,et al.  The Expectation Maximization Algorithm , 2002 .

[22]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[23]  J. Pitman,et al.  The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[24]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[25]  Thomas Hofmann,et al.  Map-Reduce for Machine Learning on Multicore , 2007 .

[26]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[27]  Ricardo da Silva Torres,et al.  Diagnosing Similarity of Oscillation Trends in Time Series , 2007 .

[28]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[29]  William S. Yerazunis,et al.  Spam filtering using a Markov random field model with variable weighting schemas , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[30]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[31]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[32]  Hang Chen Parallel implementations of probabilistic latent semantic analysis on graphic processing units , 2011 .

[33]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[34]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[35]  Eric P. Xing,et al.  Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream , 2010, UAI.

[36]  Jianwen Zhang,et al.  Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora , 2010, KDD.

[37]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[38]  Jean-Michel Marin,et al.  Bayesian Modelling and Inference on Mixtures of Distributions , 2005 .

[39]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[40]  J. Lafferty,et al.  Time-Sensitive Dirichlet Process Mixture Models , 2005 .

[41]  Dan Roth,et al.  Learning and Inference over Constrained Output , 2005, IJCAI.

[42]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[43]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[44]  Julian M. Kupiec,et al.  Robust part-of-speech tagging using a hidden Markov model , 1992 .

[45]  Max Welling,et al.  Fast collapsed gibbs sampling for latent dirichlet allocation , 2008, KDD.

[46]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[47]  F. Cozman,et al.  Generalizing variable elimination in Bayesian networks , 2000 .

[48]  Yee Whye Teh,et al.  Bayesian Nonparametric Models , 2010, Encyclopedia of Machine Learning.

[49]  Marilyn Bohl,et al.  Information processing , 1971 .

[50]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[51]  Edward Y. Chang,et al.  PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications , 2009, AAIM.

[52]  Dan Roth,et al.  Integer linear programming inference for conditional random fields , 2005, ICML.

[53]  Luis M. de Campos,et al.  Bayesian networks and information retrieval: an introduction to the special issue , 2004, Inf. Process. Manag..

[54]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[55]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[56]  Yee Whye Teh,et al.  Dirichlet Process , 2017, Encyclopedia of Machine Learning and Data Mining.

[57]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[58]  Ming-Wei Chang,et al.  Discriminative Learning over Constrained Latent Representations , 2010, NAACL.

[59]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[60]  Michael I. Jordan Graphical Models , 2003 .

[61]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[62]  Ramesh Nallapati,et al.  Parallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[63]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[64]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[65]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[66]  ChengXiang Zhai,et al.  A mixture model for contextual text mining , 2006, KDD '06.

[67]  SchwartzRichard,et al.  An Algorithm that Learns Whats in a Name , 1999 .

[68]  Sean Borman,et al.  The Expectation Maximization Algorithm A short tutorial , 2006 .

[69]  Max Welling,et al.  Distributed Inference for Latent Dirichlet Allocation , 2007, NIPS.

[70]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[71]  Yizhou Sun,et al.  iTopicModel: Information Network-Integrated Topic Modeling , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[72]  Xiaojin Zhu,et al.  Incorporating domain knowledge into topic modeling via Dirichlet Forest priors , 2009, ICML '09.

[73]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[74]  M. Escobar,et al.  Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[75]  Jiulong Shan,et al.  Parallelization and Characterization of Probabilistic Latent Semantic Analysis , 2008, 2008 37th International Conference on Parallel Processing.