Sampling Sparse Representations with Randomized Measurement Langevin Dynamics

Stochastic Gradient Langevin Dynamics (SGLD) have been widely used for Bayesian sampling from certain probability distributions, incorporating derivatives of the log-posterior. With the derivative evaluation of the log-posterior distribution, SGLD methods generate samples from the distribution through performing as a thermostats dynamics that traverses over gradient flows of the log-posterior with certainly controllable perturbation. Even when the density is not known, existing solutions still can first learn the kernel density models from the given datasets, then produce new samples using the SGLD over the kernel density derivatives. In this work, instead of exploring new samples from kernel spaces, a novel SGLD sampler, namely, Randomized Measurement Langevin Dynamics (RMLD) is proposed to sample the high-dimensional sparse representations from the spectral domain of a given dataset. Specifically, given a random measurement matrix for sparse coding, RMLD first derives a novel likelihood evaluator of the probability distribution from the loss function of LASSO, then samples from the high-dimensional distribution using stochastic Langevin dynamics with derivatives of the logarithm likelihood and Metropolis–Hastings sampling. In addition, new samples in low-dimensional measuring spaces can be regenerated using the sampled high-dimensional vectors and the measurement matrix. The algorithm analysis shows that RMLD indeed projects a given dataset into a high-dimensional Gaussian distribution with Laplacian prior, then draw new sparse representation from the dataset through performing SGLD over the distribution. Extensive experiments have been conducted to evaluate the proposed algorithm using real-world datasets. The performance comparisons on three real-world applications demonstrate the superior performance of RMLD beyond baseline methods.

[1]  H. Kile,et al.  Bandwidth Selection in Kernel Density Estimation , 2010 .

[2]  Masashi Sugiyama,et al.  Direct Density-Derivative Estimation and Its Application in KL-Divergence Approximation , 2014, AISTATS.

[3]  Amos J. Storkey,et al.  Data Augmentation Generative Adversarial Networks , 2017, ICLR 2018.

[4]  Gang Niu,et al.  Direct Density Derivative Estimation , 2016, Neural Computation.

[5]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[6]  Yuan Yan Tang,et al.  Learning With Coefficient-Based Regularized Regression on Markov Resampling , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[8]  Karthikeyan Natesan Ramamurthy,et al.  Learning Stable Multilevel Dictionaries for Sparse Representations , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Ahn Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC , 2015 .

[10]  Lawrence Carin,et al.  Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks , 2015, AAAI.

[11]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[12]  Wei Cheng,et al.  AWDA: An Adaptive Wishart Discriminant Analysis , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[13]  Di Wu,et al.  A Two-Layer Mixture Model of Gaussian Process Functional Regressions and Its MCMC EM Algorithm , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Ahn,et al.  Bayesian posterior sampling via stochastic gradient Fisher scoring Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[15]  Ingmar Schuster,et al.  Kernel Sequential Monte Carlo , 2015, ECML/PKDD.

[16]  Yann Ollivier,et al.  Natural Langevin Dynamics for Neural Networks , 2017, GSI.

[17]  Xinbo Gao,et al.  Data Augmentation-Based Joint Learning for Heterogeneous Face Recognition , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Zhanxing Zhu,et al.  SpHMC: Spectral Hamiltonian Monte Carlo , 2019, AAAI.

[19]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[20]  Lie Wang,et al.  Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise , 2011, IEEE Transactions on Information Theory.

[21]  Jie Xu,et al.  The Generalization Ability of Online SVM Classification Based on Markov Sampling , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[23]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[24]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[25]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[26]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[27]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[28]  Zhanxing Zhu,et al.  Stochastic Fractional Hamiltonian Monte Carlo , 2018, IJCAI.

[29]  Michael Elad,et al.  Stable recovery of sparse overcomplete representations in the presence of noise , 2006, IEEE Transactions on Information Theory.

[30]  Nicolas Le Roux,et al.  The Curse of Highly Variable Functions for Local Kernel Machines , 2005, NIPS.

[31]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[32]  Lawrence Carin,et al.  Bayesian Compressive Sensing , 2008, IEEE Transactions on Signal Processing.

[33]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[34]  Bo Zhang,et al.  Learning Deep Generative Models With Doubly Stochastic Gradient MCMC , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[35]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[36]  M. Bagnoli,et al.  Log-concave probability and its applications , 2004 .

[37]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[38]  Douglas Eck,et al.  A Neural Representation of Sketch Drawings , 2017, ICLR.

[39]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[40]  Martin J. Wainwright,et al.  Log-concave sampling: Metropolis-Hastings algorithms are fast! , 2018, COLT.

[41]  Heiko Strathmann Kernel methods for Monte Carlo , 2018 .

[42]  David M. Blei,et al.  A Variational Analysis of Stochastic Gradient Algorithms , 2016, ICML.

[43]  Yulong Wang,et al.  Sparse Coding From a Bayesian Perspective , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[44]  Maurizio Filippone,et al.  Enabling scalable stochastic gradient-based inference for Gaussian processes by employing the Unbiased LInear System SolvEr (ULISSE) , 2015, ICML.

[45]  Kaizhu Huang,et al.  Stochastic Conjugate Gradient Algorithm With Variance Reduction , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[46]  Ramani Duraiswami,et al.  Fast optimal bandwidth selection for kernel density estimation , 2006, SDM.

[47]  David M. Blei,et al.  Stochastic Gradient Descent as Approximate Bayesian Inference , 2017, J. Mach. Learn. Res..

[48]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[49]  Gregory Cohen,et al.  EMNIST: an extension of MNIST to handwritten letters , 2017, CVPR 2017.

[50]  Yuan Yan Tang,et al.  Generalization Performance of Fisher Linear Discriminant Based on Markov Sampling , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[51]  Ryan P. Adams,et al.  Firefly Monte Carlo: Exact MCMC with Subsets of Data , 2014, UAI.

[52]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[53]  Arthur Gretton,et al.  Gradient-free Hamiltonian Monte Carlo with Efficient Kernel Exponential Families , 2015, NIPS.

[54]  Haibo He,et al.  SOMKE: Kernel Density Estimation Over Data Streams by Sequences of Self-Organizing Maps , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[55]  Yair Weiss,et al.  On GANs and GMMs , 2018, NeurIPS.

[56]  Zhanxing Zhu,et al.  Neural Control Variates for Monte Carlo Variance Reduction , 2019, ECML/PKDD.

[57]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.