On scalable inference and learning in spike-and-slab sparse coding

Sparse coding is a widely applied latent variable analysis technique. The standard formulation of sparse coding assumes Laplace as a prior distribution for modeling the activations of latent components. In this work we study sparse coding with spike-and-slab distribution as a prior for latent activity. A spike-and-slab distribution has its probability mass distributed across a ’spike’ at zero and a ’slab’ spreading over a continuous range. For its capacity to induce exact zeros with a higher likelihood, a spike-and-slab prior distribution constitutes a more accurate model of sparse coding. The distribution as a prior also allows for the sparseness of latent activity to be directly inferred from observed data, which essentially makes spike-and-slab sparse coding more flexible and self-adaptive to a wide range of data distributions. By modeling the slab with a Gaussian distribution, we furthermore show that in contrast to the standard approach to sparse coding, we can indeed derive closed-form analytical expressions for exact inference and learning in linear spike-and-slab sparse coding. However, as the posterior landscape of a spike-and-slab prior turns out to be highly multi-modal with a prohibitive exploration cost, in addition to the exact method, we also develop subspace and Gibbs sampling based approximate inference techniques for scalable applications of the linear model. We contrast our approximation methods with variational approximation for scalable posterior inference in linear spike-and-slab sparse coding. We further combine the Gaussian spike-and-slab prior with a nonlinear generative model, which assumes a point-wise maximum combination rule for the generation of observed data. We analyze the model as a precise encoder of low-level features such as edges and their occlusions in visual data. We again combine subspace selection with Gibbs sampling to overcome the analytical intractability of performing exact inference in the model. We numerically analyze our methods on both synthetic and real data for their verification and comparison with other approaches. We assess the linear spike-and-slab approach on source separation and image denoising benchmarks. In most experiments we obtain competitive or state-of-the-art results, while we find that spike-and-slab sparse coding overall outperforms other comparable approaches. By extracting thousands of latent components from a large amount of training data we further demonstrate that our subspace Gibbs sampler is among the most scalable posterior inference methods for a linear sparse coding approach. For the nonlinear model we experiment with artificial and real images to demonstrate that the components learned by the model lie closer to the ground-truth and are easily interpretable as the underlying generative causes of the input. We find that in comparison to standard sparse coding, the nonlinear spike-and-slab approach can compressively encode images using naturally sparse and discernible compositions of latent components. We also demonstrate that the components inferred by the model from natural image patches are statistically more consistent with respect to their structure and distribution to the response patterns of simple cells in the primary visual cortex of the brain. This work thereby contributes novel methods for sophisticated inference and learning in spike-and-slab sparse coding, while it also empirically showcases their functional efficacy through a variety of applications.

[1]  Alexander A. Frolov,et al.  Two Expectation-Maximization algorithms for Boolean Factor Analysis , 2014, Neurocomputing.

[2]  M. Stephens,et al.  Scalable Variational Inference for Bayesian Variable Selection in Regression, and Its Accuracy in Genetic Association Studies , 2012 .

[3]  József Fiser,et al.  Spontaneous Cortical Activity Reveals Hallmarks of an Optimal Internal Model of the Environment , 2011, Science.

[4]  W. M. Keck,et al.  Highly Selective Receptive Fields in Mouse Visual Cortex , 2008, The Journal of Neuroscience.

[5]  Masashi Sugiyama,et al.  Least-Squares Independent Component Analysis , 2011, Neural Computation.

[6]  Jörg Lücke,et al.  Closed-Form EM for Sparse Coding and Its Application to Source Separation , 2011, LVA/ICA.

[7]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[8]  Mark J. F. Gales,et al.  Extended VTS for Noise-Robust Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Volker Tresp,et al.  Generative binary codes , 2003, Formal Pattern Analysis & Applications.

[10]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[11]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[12]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[13]  Stphane Mallat,et al.  A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way , 2008 .

[14]  John Shawe-Taylor,et al.  MahNMF: Manhattan Non-negative Matrix Factorization , 2012, ArXiv.

[15]  Zoubin Ghahramani,et al.  Nonparametric Bayesian Sparse Factor Models with application to Gene Expression modelling , 2010, The Annals of Applied Statistics.

[16]  Mike West,et al.  Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing , 2010, J. Mach. Learn. Res..

[17]  Hagai Attias,et al.  Independent Factor Analysis , 1999, Neural Computation.

[18]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[19]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[20]  Gitta Kutyniok,et al.  1 . 2 Sparsity : A Reasonable Assumption ? , 2012 .

[21]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[22]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Bhiksha Raj,et al.  Non-negative Hidden Markov Modeling of Audio with Application to Source Separation , 2010, LVA/ICA.

[24]  Bruno A. Olshausen,et al.  Highly overcomplete sparse coding , 2013, Electronic Imaging.

[25]  Andrzej Cichocki,et al.  A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[26]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[27]  K. Jarrod Millman,et al.  Learning Sparse Codes with a Mixture-of-Gaussians Prior , 1999, NIPS.

[28]  Jörg Lücke,et al.  Maximal Causes for Non-linear Component Extraction , 2008, J. Mach. Learn. Res..

[29]  Jörg Lücke,et al.  Select and Sample - A Model of Efficient Neural Inference and Learning , 2011, NIPS.

[30]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[31]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[32]  Katherine A. Heller,et al.  Evaluating Bayesian and L1 Approaches for Sparse Unsupervised Learning , 2011, ICML.

[33]  Yoshua Bengio,et al.  Scaling Up Spike-and-Slab Models for Unsupervised Feature Learning , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Jörg Lücke,et al.  The Maximal Causes of Natural Scenes are Edge Filters , 2010, NIPS.

[35]  Jörg Lücke,et al.  Why MCA? Nonlinear sparse coding with spike-and-slab prior for neurally plausible image encoding , 2012, NIPS.

[36]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[37]  Jörg Bornschein,et al.  Approximate EM Learning on Large Computer Clusters , 2010 .

[38]  Jörg Lücke,et al.  A truncated EM approach for spike-and-slab sparse coding , 2012, J. Mach. Learn. Res..

[39]  Sam T. Roweis,et al.  Factorial models and refiltering for speech separation and denoising , 2003, INTERSPEECH.

[40]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[41]  Bruno A. Olshausen,et al.  Learning Horizontal Connections in a Sparse Coding Model of Natural Images , 2007, NIPS.

[42]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[43]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[44]  Guillermo Sapiro,et al.  Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations , 2009, NIPS.

[45]  Thomas L. Griffiths,et al.  A Non-Parametric Bayesian Method for Inferring Hidden Causes , 2006, UAI.

[46]  Milos Hauskrecht,et al.  Noisy-OR Component Analysis and its Application to Link Analysis , 2006, J. Mach. Learn. Res..

[47]  Yoshua Bengio,et al.  Large-Scale Feature Learning With Spike-and-Slab Sparse Coding , 2012, ICML.

[48]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[49]  M. West,et al.  High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics , 2008, Journal of the American Statistical Association.

[50]  D. Hubel,et al.  Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[51]  David Sontag,et al.  Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests , 2013, NIPS.

[52]  Jörg Lücke,et al.  Select-and-Sample for Spike-and-Slab Sparse Coding , 2016, NIPS.

[53]  Julian Eggert,et al.  Binary Sparse Coding , 2010, LVA/ICA.

[54]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[55]  Miguel Lázaro-Gredilla,et al.  Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning , 2011, NIPS.

[56]  E. Oja,et al.  Nonlinear Blind Source Separation by Variational Bayesian Learning , 2003, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[57]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[58]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[59]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[60]  Yee Whye Teh,et al.  The Infinite Factorial Hidden Markov Model , 2008, NIPS.

[61]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[62]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[63]  Eric Saund,et al.  A Multiple Cause Mixture Model for Unsupervised Learning , 1995, Neural Computation.

[64]  Andrew Y. Ng,et al.  The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[65]  Lawrence Carin,et al.  Nonparametric factor analysis with beta process priors , 2009, ICML '09.

[66]  Yee Whye Teh,et al.  Stick-breaking Construction for the Indian Buffet Process , 2007, AISTATS.

[67]  Jian Li,et al.  Efficient sparse Bayesian learning via Gibbs sampling , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[68]  Magnus Rattray,et al.  Inference algorithms and learning theory for Bayesian sparse factor analysis , 2009 .

[69]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[70]  Guillermo Sapiro,et al.  Non-local sparse models for image restoration , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[71]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[72]  R. Zemel,et al.  Learning sparse multiple cause models , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[73]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[74]  Thomas S. Huang,et al.  Efficient Highly Over-Complete Sparse Coding Using a Mixture Model , 2010, ECCV.

[75]  Feng Liu,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries in Wavelet Domain , 2009, 2009 Fifth International Conference on Image and Graphics.

[76]  Julian Eggert,et al.  Ternary Sparse Coding , 2012, LVA/ICA.

[77]  Ole Winther,et al.  Expectation Consistent Approximate Inference , 2005, J. Mach. Learn. Res..

[78]  Zhenwen Dai,et al.  Autonomous cleaning of corrupted scanned documents — A generative modeling approach , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[79]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[80]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[81]  Michael Elad,et al.  Sparse Representation for Color Image Restoration , 2008, IEEE Transactions on Image Processing.

[82]  M. A. Repucci,et al.  Spatial Structure and Symmetry of Simple-Cell Receptive Fields in Macaque Primary Visual Cortex , 2002 .

[83]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[84]  Jörg Lücke,et al.  Nonlinear Spike-And-Slab Sparse Coding for Interpretable Image Encoding , 2015, PloS one.

[85]  Richard E. Turner,et al.  Efficient occlusive components analysis , 2014, J. Mach. Learn. Res..

[86]  Jörg Lücke,et al.  Are V1 Simple Cells Optimized for Visual Occlusions? A Comparative Study , 2013, PLoS Comput. Biol..

[87]  Manfred Opper,et al.  Expectation Propagation with Factorizing Distributions: A Gaussian Approximation and Performance Results for Simple Models , 2011, Neural Computation.

[88]  Mark J. F. Gales,et al.  Covariance modelling for noise-robust speech recognition , 2008, INTERSPEECH.

[89]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[90]  Florian Steinke,et al.  Bayesian Inference and Optimal Design in the Sparse Linear Model , 2007, AISTATS.

[91]  Alexander Ilin,et al.  On the Effect of the Form of the Posterior Approximation in Variational Learning of ICA Models , 2005, Neural Processing Letters.

[92]  Tommi S. Jaakkola,et al.  Tutorial on variational approximation methods , 2000 .

[93]  J. H. Hateren,et al.  Independent component filters of natural images compared with simple cells in primary visual cortex , 1998 .

[94]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[95]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[96]  Zoubin Ghahramani,et al.  Infinite Sparse Factor Analysis and Infinite Independent Components Analysis , 2007, ICA.

[97]  P. Földiák,et al.  Forming sparse representations by local anti-Hebbian learning , 1990, Biological Cybernetics.

[98]  Michael W. Spratling Learning Image Components for Object Recognition , 2006, J. Mach. Learn. Res..

[99]  Eric Moulines,et al.  Maximum likelihood for blind separation and deconvolution of noisy signals using mixture models , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[100]  Yair Weiss,et al.  "Natural Images, Gaussian Mixtures and Dead Leaves" , 2012, NIPS.

[101]  Julian Eggert,et al.  Expectation Truncation and the Benefits of Preselection In Training Generative Models , 2010, J. Mach. Learn. Res..

[102]  W. Usrey,et al.  Receptive fields and response properties of neurons in layer 4 of ferret visual cortex. , 2003, Journal of neurophysiology.

[103]  Richard E. Turner,et al.  Two problems with variational expectation maximisation for time-series models , 2011 .

[104]  Yoshua Bengio,et al.  Unsupervised Models of Images by Spikeand-Slab RBMs , 2011, ICML.

[105]  Zhenwen Dai,et al.  What Are the Invariant Occlusive Components of Image Patches? A Probabilistic Generative Approach , 2013, NIPS.

[106]  Peter Dayan,et al.  Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .

[107]  Marc'Aurelio Ranzato,et al.  Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition , 2010, ArXiv.