Scalable inference in max-margin topic models

Topic models have played a pivotal role in analyzing large collections of complex data. Besides discovering latent semantics, supervised topic models (STMs) can make predictions on unseen test data. By marrying with advanced learning techniques, the predictive strengths of STMs have been dramatically enhanced, such as max-margin supervised topic models, state-of-the-art methods that integrate max-margin learning with topic models. Though powerful, max-margin STMs have a hard non-smooth learning problem. Existing algorithms rely on solving multiple latent SVM subproblems in an EM-type procedure, which can be too slow to be applicable to large-scale categorization tasks. In this paper, we present a highly scalable approach to building max-margin supervised topic models. Our approach builds on three key innovations: 1) a new formulation of Gibbs max-margin supervised topic models for both multi-class and multi-label classification; 2) a simple ``augment-and-collapse" Gibbs sampling algorithm without making restricting assumptions on the posterior distributions; 3) an efficient parallel implementation that can easily tackle data sets with hundreds of categories and millions of documents. Furthermore, our algorithm does not need to solve SVM subproblems. Though performing the two tasks of topic discovery and learning predictive models jointly, which significantly improves the classification performance, our methods have comparable scalability as the state-of-the-art parallel algorithms for the standard LDA topic models which perform the single task of topic discovery only. Finally, an open-source implementation is also provided at: http://www.ml-thu.net/~jun/medlda.

[1]  SmolaAlexander,et al.  An architecture for parallel topic models , 2010, VLDB 2010.

[2]  Shuang-Hong Yang,et al.  Hybrid Generative/Discriminative Learning for Automatic Image Annotation , 2010, UAI.

[3]  Maosong Sun,et al.  Monte Carlo Methods for Maximum Margin Supervised Topic Models , 2012, NIPS.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[6]  Andrew McCallum,et al.  Efficient methods for topic model inference on streaming document collections , 2009, KDD.

[7]  Eric P. Xing,et al.  MedLDA: maximum margin supervised topic models , 2012, J. Mach. Learn. Res..

[8]  Max Welling,et al.  Fast collapsed gibbs sampling for latent dirichlet allocation , 2008, KDD.

[9]  Alexander J. Smola,et al.  An architecture for parallel topic models , 2010, Proc. VLDB Endow..

[10]  David A. McAllester PAC-Bayesian Stochastic Model Selection , 2003, Machine Learning.

[11]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[12]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[13]  Max Welling,et al.  Distributed Algorithms for Topic Models , 2009, J. Mach. Learn. Res..

[14]  Xiao-Li Meng,et al.  The Art of Data Augmentation , 2001 .

[15]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[16]  Alexander J. Smola,et al.  Scalable inference in latent variable models , 2012, WSDM '12.

[17]  Bo Zhang,et al.  Fast Max-Margin Matrix Factorization with Data Augmentation , 2013, ICML.

[18]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[19]  Ning Chen,et al.  Infinite Latent SVM for Classification and Multi-task Learning , 2011, NIPS.

[20]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Yang Wang,et al.  Max-margin Latent Dirichlet Allocation for Image Classification and Annotation , 2011, BMVC.

[22]  Ning Chen,et al.  Gibbs Max-Margin Topic Models with Fast Sampling Algorithms , 2013, ICML.

[23]  Lior Rokach,et al.  Data Mining and Knowledge Discovery Handbook, 2nd ed , 2010, Data Mining and Knowledge Discovery Handbook, 2nd ed..

[24]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[25]  Ning Chen,et al.  Generalized Relational Topic Models with Data Augmentation , 2013, IJCAI.

[26]  François Laviolette,et al.  PAC-Bayesian learning of linear classifiers , 2009, ICML '09.

[27]  O. Catoni PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.

[28]  Nicholas G. Polson,et al.  Data augmentation for support vector machines , 2011 .

[29]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[30]  W. R. Schucany,et al.  Generating Random Variates Using Transformations with Multiple Roots , 1976 .

[31]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[32]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[33]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..