MedLDA: maximum margin supervised topic models

A supervised topic model can use side information such as ratings or labels associated with documents or images to discover more predictive low dimensional topical representations of the data. However, existing supervised topic models predominantly employ likelihood-driven objective functions for learning and inference, leaving the popular and potentially powerful max-margin principle unexploited for seeking predictive representations of data and more discriminative topic bases for the corpus. In this paper, we propose the maximum entropy discrimination latent Dirichlet allocation (MedLDA) model, which integrates the mechanism behind the max-margin prediction models (e.g., SVMs) with the mechanism behind the hierarchical Bayesian topic models (e.g., LDA) under a unified constrained optimization framework, and yields latent topical representations that are more discriminative and more suitable for prediction tasks such as document classification or regression. The principle underlying the MedLDA formalism is quite general and can be applied for jointly max-margin and maximum likelihood learning of directed or undirected topic models when supervising side information is available. Efficient variational methods for posterior inference and parameter estimation are derived and extensive empirical studies on several real data sets are also provided. Our experimental results demonstrate qualitatively and quantitatively that MedLDA could: 1) discover sparse and highly discriminative topical representations; 2) achieve state of the art prediction performance; and 3) be more efficient than existing supervised topic models, especially for classification.

[1]  Gal Chechik,et al.  Extracting Relevant Structures with Side Information , 2002, NIPS.

[2]  Michal Rosen-Zvi,et al.  Hidden Topic Markov Models , 2007, AISTATS.

[3]  Fei-Fei Li,et al.  Large Margin Learning of Upstream Scene Understanding Models , 2010, NIPS.

[4]  Eric P. Xing,et al.  HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation , 2007, NIPS.

[5]  Yang Wang,et al.  Max-margin Latent Dirichlet Allocation for Image Classification and Annotation , 2011, BMVC.

[6]  David M. Pennock,et al.  Mixtures of Conditional Maximum Entropy Models , 2003, ICML.

[7]  Ning Chen,et al.  Predictive Subspace Learning for Multi-view Data: a Large Margin Approach , 2010, NIPS.

[8]  Michael I. Jordan,et al.  Discriminative machine learning with structure , 2009 .

[9]  Bo Zhang,et al.  Partially Observed Maximum Entropy Discrimination Markov Networks , 2008, NIPS.

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[12]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[14]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[15]  Ivan Titov,et al.  A Joint Model of Text and Aspect Ratings for Sentiment Summarization , 2008, ACL.

[16]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[17]  Dingcheng Li,et al.  A Combination of Topic Models with Max-margin Learning for Relation Detection , 2011, Graph-based Methods for Natural Language Processing.

[18]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[19]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[20]  S. Walker,et al.  Sampling Truncated Normal, Beta, and Gamma Densities , 2001 .

[21]  Yee Whye Teh,et al.  A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation , 2006, NIPS.

[22]  Jun Zhu,et al.  Maximum Entropy Discrimination Markov Networks , 2009, J. Mach. Learn. Res..

[23]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  Richard A. Davis,et al.  Efficient Gibbs Sampling of Truncated Multivariate Normal with Application to Constrained Linear Regression , 2004 .

[25]  Rong Yan,et al.  Mining Associated Text and Images with Dual-Wing Harmoniums , 2005, UAI.

[26]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[27]  N. Stanietsky,et al.  The interaction of TIGIT with PVR and PVRL2 inhibits human NK cell cytotoxicity , 2009, Proceedings of the National Academy of Sciences.

[28]  Li Fei-Fei,et al.  Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[30]  Geoffrey E. Hinton,et al.  Replicated Softmax: an Undirected Topic Model , 2009, NIPS.

[31]  Ning Chen,et al.  Infinite Latent SVM for Classification and Multi-task Learning , 2011, NIPS.

[32]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Ning Chen,et al.  Infinite SVM: a Dirichlet Process Mixture of Large-margin Kernel Machines , 2011, ICML.

[34]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Andrew McCallum,et al.  Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression , 2008, UAI.

[36]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  William E. Griffiths,et al.  A Gibbs’ Sampler for the Parameters of a Truncated Multivariate Normal Distribution , 2002 .

[38]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[39]  Eric P. Xing,et al.  Conditional Topic Random Fields , 2010, ICML.

[40]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[41]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[42]  P. Mansfield,et al.  Medical imaging by NMR. , 1977, The British journal of radiology.

[43]  Alex Pentland,et al.  Discriminative, generative and imitative learning , 2002 .

[44]  Shuang-Hong Yang,et al.  Hybrid Generative/Discriminative Learning for Automatic Image Annotation , 2010, UAI.

[45]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[46]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[47]  Tommi S. Jaakkola,et al.  Maximum Entropy Discrimination , 1999, NIPS.

[48]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[49]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[50]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[51]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[52]  Richard S. Zemel,et al.  Learning Hybrid Models for Image Annotation with Partially Labeled Data , 2008, NIPS.

[53]  Michael I. Jordan,et al.  DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification , 2008, NIPS.

[54]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[55]  Eric P. Xing,et al.  MedLDA: maximum margin supervised topic models for regression and classification , 2009, ICML '09.