Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models

Diversification has been shown to be a powerful mechanism for learning robust models in nonconvex settings. A notable example is learning mixture models, in which enforcing diversity between the different mixture components allows us to prevent the model collapsing phenomenon and capture more patterns from the observed data. In this work, we present a variational approach for diversity-promoting learning, which leverages the entropy functional as a natural mechanism for enforcing diversity. We develop a simple and efficient functional gradientbased algorithm for optimizing the variational objective function, which provides a significant generalization of Stein variational gradient descent (SVGD). We test our method on various challenging real world problems, including deep embedded clustering and deep anomaly detection. Empirical results show that our method provides an effective mechanism for diversitypromoting learning, achieving substantial improvement over existing methods.

[1]  Ben Taskar,et al.  Approximate Inference in Continuous Determinantal Processes , 2013, NIPS.

[2]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[3]  Hao Liu,et al.  Variational Inference with Tail-adaptive f-Divergence , 2018, NeurIPS.

[4]  Kristen Grauman,et al.  Diverse Sequential Subset Selection for Supervised Video Summarization , 2014, NIPS.

[5]  Lawrence O. Hall,et al.  Ensemble diversity measures and their application to thinning , 2004, Inf. Fusion.

[6]  Yu Cheng,et al.  Deep Structured Energy Based Models for Anomaly Detection , 2016, ICML.

[7]  Pengtao Xie,et al.  Diversity-Promoting Bayesian Learning of Latent Variable Models , 2016, ICML.

[8]  Alexei Borodin,et al.  Determinantal point processes , 2009, 0911.1153.

[9]  David A. McAllester PAC-Bayesian model averaging , 1999, COLT '99.

[10]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[11]  Ryan P. Adams,et al.  Priors for Diversity in Generative Latent Variable Models , 2012, NIPS.

[12]  Ding-Xuan Zhou Derivative reproducing properties for kernel methods in learning theory , 2008 .

[13]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[14]  Ross B. Girshick,et al.  Reducing Overfitting in Deep Networks by Decorrelating Representations , 2015, ICLR.

[15]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[16]  Suvrit Sra,et al.  Diversity Networks: Neural Network Compression Using Determinantal Point Processes , 2015, 1511.05077.

[17]  Pushmeet Kohli,et al.  Batched Gaussian Process Bandit Optimization via Determinantal Point Processes , 2016, NIPS.

[18]  Bo Yang,et al.  Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering , 2016, ICML.

[19]  Dhruv Batra,et al.  Joint Unsupervised Learning of Deep Representations and Image Clusters , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yang Yu,et al.  Diversity Regularized Machine , 2011, IJCAI.

[21]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[22]  Yuval Peres,et al.  Zeros of Gaussian Analytic Functions and Determinantal Point Processes , 2009, University Lecture Series.

[23]  Pengtao Xie,et al.  Uncorrelation and Evenness: a New Diversity-Promoting Regularizer , 2017, ICML.

[24]  Filippo Santambrogio,et al.  Introduction to optimal transport theory , 2010, Optimal Transport.

[25]  Qiang Liu,et al.  Stein Variational Gradient Descent as Gradient Flow , 2017, NIPS.

[26]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[27]  Cheng Deng,et al.  Deep Clustering via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[29]  Grigorios Tsoumakas,et al.  Focused Ensemble Selection: A Diversity-Based Method for Greedy Ensemble Selection , 2008, ECAI.

[30]  Bo Zong,et al.  Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection , 2018, ICLR.

[31]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[32]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[33]  Laming Chen,et al.  Fast Greedy MAP Inference for Determinantal Point Process to Improve Recommendation Diversity , 2017, NeurIPS.

[34]  Hedvig Kjellstrom,et al.  Determinantal Point Processes for Mini-Batch Diversification , 2017, UAI 2017.

[35]  Suvrit Sra,et al.  Exponentiated Strongly Rayleigh Distributions , 2018, NeurIPS.

[36]  P. Diaconis,et al.  Use of exchangeable pairs in the analysis of simulations , 2004 .