论文信息 - Prediction-Constrained Training for Semi-Supervised Mixture and Topic Models - 字舞流文

Prediction-Constrained Training for Semi-Supervised Mixture and Topic Models

Supervisory signals have the potential to make low-dimensional data representations, like those learned by mixture and topic models, more interpretable and useful. We propose a framework for training latent variable models that explicitly balances two goals: recovery of faithful generative explanations of high-dimensional data, and accurate prediction of associated semantic labels. Existing approaches fail to achieve these goals due to an incomplete treatment of a fundamental asymmetry: the intended application is always predicting labels from data, not data from labels. Our prediction-constrained objective for training generative models coherently integrates loss-based supervisory signals while enabling effective semi-supervised learning from partially labeled data. We derive learning algorithms for semi-supervised mixture and topic models using stochastic gradient descent with automatic differentiation. We demonstrate improved prediction quality compared to several previous supervised topic models, achieving predictions competitive with high-dimensional logistic regression on text sentiment analysis and electronic health records tasks while simultaneously learning interpretable topics.

Finale Doshi-Velez | Erik B. Sudderth | Michael C. Hughes | Leah Weiner | Thomas H. McCoy | Gabriel Hope | Roy H. Perlis | T. McCoy | R. Perlis | Gabriel Hope | Leah Weiner | M. Hughes | F. Doshi-Velez

[1] Gideon S. Mann,et al. Simple, robust, scalable semi-supervised learning via expectation regularization , 2007, ICML '07.

[2] Thomas L. Griffiths,et al. Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[3] Slobodan Vucetic,et al. Supervised clustering of label ranking data using label preference information , 2013, Machine Learning.

[4] Geoffrey E. Hinton,et al. Parameter estimation for linear dynamical systems , 1996 .

[5] Yoni Halpern. A Comparison of Dimensionality Reduction Techniques for Unstructured Clinical Text , 2012 .

[6] Amit Dhurandhar,et al. Uncovering Group Level Insights with Accordant Clustering , 2017, SDM.

[7] Warren B. Powell,et al. Dirichlet Process Mixtures of Generalized Linear Models , 2009, J. Mach. Learn. Res..

[8] Antoine Cornuéjols,et al. Supervised Pre-processings Are Useful for Supervised Clustering , 2014, ECDA.

[9] Thomas M. DiCicco,et al. Machine Classification of Prosodic Control in Dysarthria , 2010 .

[10] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[11] Sam T. Roweis,et al. EM Algorithms for PCA and SPCA , 1997, NIPS.

[12] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[13] Ben Taskar,et al. Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[14] James O. Berger,et al. Modularization in Bayesian analysis, with emphasis on analysis of computer models , 2009 .

[15] Ben Taskar,et al. Expectation Maximization and Posterior Constraints , 2007, NIPS.

[16] B. Everitt,et al. Finite Mixture Distributions , 1981 .

[17] R. Geetha Ramani,et al. Improved Classification of Lung Cancer Tumors Based on Structural and Physicochemical Properties of Proteins Using Data Mining Models , 2013, PloS one.

[18] Michael C. Hughes,et al. Supervised topic models for clinical interpretability , 2016, 1612.01678.

[19] Yelong Shen,et al. End-to-end Learning of LDA by Mirror-Descent Back Propagation over a Deep Architecture , 2015, NIPS.

[20] Charles A. Sutton,et al. Autoencoding Variational Inference For Topic Models , 2017, ICLR.

[21] R. Shumway,et al. AN APPROACH TO TIME SERIES SMOOTHING AND FORECASTING USING THE EM ALGORITHM , 1982 .

[22] Alvaro Soto,et al. A proposal for supervised clustering with Dirichlet Process using labels , 2016, Pattern Recognit. Lett..

[23] Sanjeev Arora,et al. A Practical Algorithm for Topic Modeling with Provable Guarantees , 2012, ICML.

[24] Mihaela van der Schaar,et al. Personalized Donor-Recipient Matching for Organ Transplantation , 2016, AAAI.

[25] Eric P. Xing,et al. MedLDA: maximum margin supervised topic models , 2012, J. Mach. Learn. Res..

[26] Bo Pang,et al. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[27] Sebastian Thrun,et al. Learning to Classify Text from Labeled and Unlabeled Documents , 1998, AAAI/IAAI.

[28] Victor J. Rayward-Smith,et al. Adapting k-means for supervised clustering , 2006, Applied Intelligence.

[29] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30] Lawrence Carin,et al. Semisupervised Learning of Hidden Markov Models via a Homotopy Method , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31] Gideon S. Mann,et al. Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data , 2010, J. Mach. Learn. Res..

[32] Ning Chen,et al. Gibbs Max-Margin Topic Models with Fast Sampling Algorithms , 2013, ICML.

[33] Michael I. Jordan,et al. Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[34] Sylvia Richardson,et al. PReMiuM: An R Package for Profile Regression Mixture Models Using Dirichlet Processes. , 2013, Journal of statistical software.

[35] Alex Pentland,et al. Maximum Conditional Likelihood via Bound Maximization and the CEM Algorithm , 1998, NIPS.

[36] Yuchung J. Wang,et al. Stochastic Blockmodels for Directed Graphs , 1987 .

[37] Ramesh Nallapati,et al. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[38] Robert D. Nowak,et al. Wavelet-based statistical signal processing using hidden Markov models , 1998, IEEE Trans. Signal Process..

[39] Jun Zhu,et al. Spectral Methods for Supervised Topic Models , 2014, NIPS.

[40] David B. Dunson,et al. Probabilistic topic models , 2012, Commun. ACM.

[41] Alvaro Soto,et al. Enhancing K-Means using class labels , 2013, Intell. Data Anal..

[42] Christoph F. Eick,et al. Supervised clustering - algorithms and benefits , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[43] Thomas L. Griffiths,et al. Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[44] Daniel M. Roy,et al. Complexity of Inference in Latent Dirichlet Allocation , 2011, NIPS.

[45] Andreas Krause,et al. Discriminative Clustering by Regularized Information Maximization , 2010, NIPS.

[46] Mark Steyvers,et al. Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[47] Andrew McCallum,et al. Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression , 2008, UAI.

[48] Babak Shahbaba,et al. Nonlinear Models Using Dirichlet Process Mixtures , 2007, J. Mach. Learn. Res..

[49] Nando de Freitas,et al. An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[50] Michael E. Tipping,et al. Probabilistic Principal Component Analysis , 1999 .

[51] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[52] L. Rabiner,et al. An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[53] Thorsten Joachims,et al. Supervised clustering with support vector machines , 2005, ICML.

[54] Adrian Corduneanu,et al. Continuation Methods for Mixing Heterogenous Sources , 2002, UAI.

[55] Alex Pentland,et al. Discriminative, generative and imitative learning , 2002 .

[56] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[57] Michael I. Jordan,et al. DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification , 2008, NIPS.

[58] David M. Blei,et al. Supervised Topic Models , 2007, NIPS.

[59] Ning Chen,et al. Bayesian inference with posterior regularization and applications to infinite latent SVMs , 2012, J. Mach. Learn. Res..

[60] Ming-Wei Chang,et al. Guiding Semi-Supervision with Constraint-Driven Learning , 2007, ACL.

[61] Martyn Plummer. Cuts in Bayesian graphical models , 2015, Stat. Comput..

[62] Sylvia Richardson,et al. Using Bayesian graphical models to model biases in observational studies and to combine multiple sources of data: application to low birth weight and water disinfection by‐products , 2009 .

[63] Tommi S. Jaakkola,et al. Maximum Entropy Discrimination , 1999, NIPS.

[64] Matt Taddy,et al. On Estimation and Selection for Topic Models , 2011, AISTATS.

[65] Tao Mei,et al. Travel Recommendation via Author Topic Model Based Collaborative Filtering , 2015, MMM.

[66] Hedvig Kjellström,et al. How to Supervise Topic Models , 2014, ECCV Workshops.

[67] Yong Ren,et al. Spectral Learning for Supervised Topic Models , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[69] Sylvia Richardson,et al. Bayesian profile regression with an application to the National Survey of Children's Health. , 2010, Biostatistics.

[70] John D. Lafferty,et al. Dynamic topic models , 2006, ICML.

[71] Francis R. Bach,et al. Robust Discriminative Clustering with Sparse Regularizers , 2017, J. Mach. Learn. Res..