Towards Understanding Sparse Filtering: A Theoretical Perspective

In this paper we present a theoretical analysis to understand sparse filtering, a recent and effective algorithm for unsupervised learning. The aim of this research is not to show whether or how well sparse filtering works, but to understand why and when sparse filtering does work. We provide a thorough theoretical analysis of sparse filtering and its properties, and further offer an experimental validation of the main outcomes of our theoretical analysis. We show that sparse filtering works by explicitly maximizing the entropy of the learned representations through the maximization of the proxy of sparsity, and by implicitly preserving mutual information between original and learned representations through the constraint of preserving a structure of the data. Specifically, we show that the sparse filtering algorithm implemented using an absolute-value non-linearity determines the preservation of a data structure defined by relations of neighborhoodness under the cosine distance. Furthermore, we empirically validate our theoretical results with artificial and real data sets, and we apply our theoretical understanding to explain the success of sparse filtering on real-world problems. Our work provides a strong theoretical basis for understanding sparse filtering: it highlights assumptions and conditions for success behind this feature distribution learning algorithm, and provides insights for developing new feature distribution learning algorithms.

[1]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[2]  Michael Elad,et al.  Sparse and Redundant Representations - From Theory to Applications in Signal and Image Processing , 2010 .

[3]  David Zhang,et al.  A Survey of Sparse Representation: Algorithms and Applications , 2015, IEEE Access.

[4]  K. Ball An elementary introduction to modern convex geometry, in flavors of geometry , 1997 .

[5]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[6]  Yunde Jia,et al.  Vehicle Type Classification Using a Semisupervised Convolutional Neural Network , 2015, IEEE Transactions on Intelligent Transportation Systems.

[7]  Sergio Guadarrama,et al.  Compute Less to Get More: Using ORC to Improve Sparse Filtering , 2015, AAAI.

[8]  D.P. Skinner,et al.  The cepstrum: A guide to processing , 1977, Proceedings of the IEEE.

[9]  Razvan Pascanu,et al.  On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[10]  Andrew Y. Ng,et al.  The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[11]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[12]  Riku Jantti,et al.  Mathematics of Sparsity and Entropy: Axioms, Core Functions and Sparse Recovery , 2015, ArXiv.

[13]  Sanjoy Dasgupta,et al.  An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[14]  Edmund T. Rolls,et al.  The relative advantages of sparse versus distributed encoding for associative neuronal networks in the brain , 1990 .

[15]  Misha Denil,et al.  Recklessly Approximate Sparse Coding , 2012, ArXiv.

[16]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  K. Ball An Elementary Introduction to Modern Convex Geometry , 1997 .

[18]  H. Sompolinsky,et al.  Compressed sensing, sparsity, and dimensionality in neuronal information processing and data analysis. , 2012, Annual review of neuroscience.

[19]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[20]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[21]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[22]  Xin Zhang,et al.  Single-layer Unsupervised Feature Learning with l2 regularized sparse filtering , 2014, 2014 IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP).

[23]  Matthieu Cord,et al.  Learning Deep Hierarchical Visual Feature Coding , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[24]  Kyogu Lee,et al.  Hierarchical Approach to Detect Common Mistakes of Beginner Flute Players , 2014, ISMIR.

[25]  Juhan Nam,et al.  Sparse feature learning for instrument identification: Effects of sampling and pooling methods. , 2016, The Journal of the Acoustical Society of America.

[26]  Yunde Jia,et al.  Vehicle Type Classification Using Unsupervised Convolutional Neural Network , 2014, 2014 22nd International Conference on Pattern Recognition.

[27]  Jose C. Principe,et al.  Information Theoretic Learning - Renyi's Entropy and Kernel Perspectives , 2010, Information Theoretic Learning.

[28]  Jiquan Ngiam,et al.  Sparse Filtering , 2011, NIPS.

[29]  Chao Liu,et al.  Pedestrian Detection Using Deep Channel Features in Monocular Image Sequences , 2016, ICONIP.

[30]  Thomas Hofmann,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2007 .

[31]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[32]  Huanxin Zou,et al.  Accumulating pyramid spatial-spectral collaborative coding divergence for hyperspectral anomaly detection , 2016, ISPRS International Conference on Computer Vision and Remote Sensing.

[33]  Dongrui Wu,et al.  Acoustic feature analysis in speech emotion primitives estimation , 2010, INTERSPEECH.

[34]  Neil D. B. Bruce,et al.  Temporal responses of chemically diverse sensor arrays for machine olfaction using artificial intelligence , 2016 .

[35]  M. Carandini,et al.  Normalization as a canonical neural computation , 2011, Nature Reviews Neuroscience.

[36]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[37]  Cheng Soon Ong,et al.  A Modular Theory of Feature Learning , 2016, ArXiv.

[38]  Frans Coenen,et al.  Driving posture recognition by convolutional neural networks , 2015, 2015 11th International Conference on Natural Computation (ICNC).

[39]  Kjersti Engan,et al.  Method of optimal directions for frame design , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[40]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[41]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[42]  H. Sompolinsky,et al.  Sparseness and Expansion in Sensory Representations , 2014, Neuron.

[43]  Kiran B. Raja,et al.  Smartphone based visible iris recognition using deep sparse filtering , 2015, Pattern Recognit. Lett..

[44]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[45]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[46]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[47]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[48]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[49]  Timothy Q Gentner,et al.  Central auditory neurons have composite receptive fields , 2016, Proceedings of the National Academy of Sciences.

[50]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[51]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[52]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[53]  Elan Barenholtz,et al.  Deep learning human actions from video via sparse filtering and locally competitive algorithms , 2015, Multimedia Tools and Applications.

[54]  Feng Jia,et al.  An Intelligent Fault Diagnosis Method Using Unsupervised Feature Learning Towards Mechanical Big Data , 2016, IEEE Transactions on Industrial Electronics.

[55]  Hongyu Li,et al.  Learning quality-aware filters for no-reference image quality assessment , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[56]  Peter Földiák,et al.  SPARSE CODING IN THE PRIMATE CORTEX , 2002 .

[57]  Scott T. Rickard,et al.  Comparing Measures of Sparsity , 2008, IEEE Transactions on Information Theory.

[58]  Shaohua Zhang,et al.  Performance evaluation of typical unsupervised feature learning algorithms for visual object recognition , 2014, Proceeding of the 11th World Congress on Intelligent Control and Automation.

[59]  Petia Radeva,et al.  No more meta-parameter tuning in unsupervised sparse feature learning , 2014, ArXiv.

[60]  Yann LeCun,et al.  Transformation Invariance in Pattern Recognition-Tangent Distance and Tangent Propagation , 1996, Neural Networks: Tricks of the Trade.

[61]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[62]  Feng Liu,et al.  Edge Directed Single Image Super Resolution Through the Learning Based Gradient Regression Estimation , 2015, ICIG.

[63]  Matthieu Cord,et al.  Unsupervised and Supervised Visual Codes with Restricted Boltzmann Machines , 2012, ECCV.

[64]  Motoaki Kawanabe,et al.  Machine Learning in Non-Stationary Environments - Introduction to Covariate Shift Adaptation , 2012, Adaptive computation and machine learning.

[65]  D. Donoho,et al.  Atomic Decomposition by Basis Pursuit , 2001 .

[66]  Neil D. B. Bruce,et al.  Sparse coding in early visual representation: From specific properties to general principles , 2016, Neurocomputing.

[67]  Kyogu Lee,et al.  Detecting fingering of overblown flute sound using sparse feature learning , 2016, EURASIP J. Audio Speech Music. Process..

[68]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[69]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[70]  Ralph Linsker,et al.  An Application of the Principle of Maximum Information Preservation to Linear Systems , 1988, NIPS.

[71]  Yuhong Yang,et al.  Information Theory, Inference, and Learning Algorithms , 2005 .

[72]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[73]  Lukasz Romaszko A Deep Learning Approach with an Ensemble-Based Neural Network Classifier for Black Box ICML 2013 Contest , 2013 .