There and Back Again: A General Approach to Learning Sparse Models

We propose a simple and efficient approach to learning sparse models. Our approach consists of (1) projecting the data into a lower dimensional space, (2) learning a dense model in the lower dimensional space, and then (3) recovering the sparse model in the original space via compressive sensing. We apply this approach to Non-negative Matrix Factorization (NMF), tensor decomposition and linear classification—showing that it obtains 10× compression with negligible loss in accuracy on real data, and obtains up to 5× speedups. Our main theoretical contribution is to show the following result for NMF: if the original factors are sparse, then their projections are the sparsest solutions to the projected NMF problem. This explains why our method works for NMF and shows an interesting new property of random projections: they can preserve the solutions of non-convex optimization problems such as NMF.

[1]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[2]  Hyunsoo Kim,et al.  Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares , 2006 .

[3]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[4]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[5]  David P. Woodruff,et al.  Sublinear Time Orthogonal Tensor Decomposition , 2016, NIPS.

[6]  Fei Wang,et al.  Efficient Document Clustering via Online Nonnegative Matrix Factorizations , 2011, SDM.

[7]  R. Nowak,et al.  Compressed Sensing for Networked Data , 2008, IEEE Signal Processing Magazine.

[8]  Xiaofeng Gong,et al.  Tensor decomposition of EEG signals: A brief review , 2015, Journal of Neuroscience Methods.

[9]  Rong Jin,et al.  Random Projections for Classification: A Recovery Approach , 2014, IEEE Transactions on Information Theory.

[10]  Huan Wang,et al.  Exact Recovery of Sparsely-Used Dictionaries , 2012, COLT.

[11]  C. O’Brien Statistical Learning with Sparsity: The Lasso and Generalizations , 2016 .

[12]  Song Han,et al.  ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.

[13]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[14]  Piotr Indyk,et al.  Sequential Sparse Matching Pursuit , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[15]  Anima Anandkumar,et al.  Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank-1 Updates , 2014, ArXiv.

[16]  J. Kruskal Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .

[17]  Ling Huang,et al.  Predicting Execution Time of Computer Programs Using Sparse Polynomial Regression , 2010, NIPS.

[18]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[19]  Prateek Jain,et al.  Tensor vs. Matrix Methods: Robust Tensor Decomposition under Block Sparse Perturbations , 2015, AISTATS.

[20]  J. Bai,et al.  Forecasting economic time series using targeted predictors , 2008 .

[21]  Babak Hassibi,et al.  Recovering Sparse Signals Using Sparse Measurement Matrices in Compressed DNA Microarrays , 2008, IEEE Journal of Selected Topics in Signal Processing.

[22]  Fei Wang,et al.  Efficient Nonnegative Matrix Factorization with Random Projections , 2010, SDM.

[23]  Karin Strauss,et al.  A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.

[24]  P. Indyk,et al.  Near-Optimal Sparse Recovery in the L1 Norm , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[25]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[26]  Richard G. Baraniuk,et al.  Compressive Sensing DNA Microarrays , 2008, EURASIP J. Bioinform. Syst. Biol..

[27]  Torsten Haferlach,et al.  Microarray-based classifiers and prognosis models identify subgroups with distinct clinical outcomes and high risk of AML transformation of myelodysplastic syndrome. , 2009, Blood.

[28]  Santosh S. Vempala,et al.  An algorithmic theory of learning: Robust concepts and random projection , 1999, Machine Learning.

[29]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[30]  Larry A. Wasserman,et al.  Compressed and Privacy-Sensitive Sparse Regression , 2009, IEEE Transactions on Information Theory.

[31]  Tzyy-Ping Jung,et al.  Compressed Sensing of EEG for Wireless Telemonitoring With Low Energy Consumption and Inexpensive Hardware , 2012, IEEE Transactions on Biomedical Engineering.

[32]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[33]  Haesun Park,et al.  Sparse Nonnegative Matrix Factorization for Clustering , 2008 .

[34]  Stan Z. Li,et al.  Learning spatially localized, parts-based representation , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[35]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[36]  Rémi Munos,et al.  Compressed Least-Squares Regression , 2009, NIPS.

[37]  Piotr Indyk,et al.  Combining geometry and combinatorics: A unified approach to sparse signal recovery , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[38]  David P. Woodruff Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[39]  Daniel M. Kane,et al.  Sparser Johnson-Lindenstrauss Transforms , 2010, JACM.

[40]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[41]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[42]  N. Linial,et al.  Expander Graphs and their Applications , 2006 .

[43]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[44]  Alexander J. Smola,et al.  Fast and Guaranteed Tensor Decomposition via Sketching , 2015, NIPS.

[45]  R. Calderbank,et al.  Compressed Learning : Universal Sparse Dimensionality Reduction and Learning in the Measurement Domain , 2009 .

[46]  E. Candès The restricted isometry property and its implications for compressed sensing , 2008 .

[47]  Tamara G. Kolda,et al.  A Practical Randomized CP Tensor Decomposition , 2017, SIAM J. Matrix Anal. Appl..

[48]  Wooseok Ha,et al.  Robust PCA with compressed data , 2015, NIPS.

[49]  Andrea Montanari,et al.  Sparse PCA via Covariance Thresholding , 2013, J. Mach. Learn. Res..

[50]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[51]  David P. Woodruff,et al.  Low rank approximation and regression in input sparsity time , 2012, STOC '13.

[52]  A. Regev,et al.  Efficient Generation of Transcriptomic Profiles by Random Composite Measurements , 2017, Cell.

[53]  Santosh S. Vempala,et al.  Kernels as features: On kernels, margins, and low-dimensional mappings , 2006, Machine Learning.

[54]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[55]  Zhigang Luo,et al.  Online Nonnegative Matrix Factorization With Robust Stochastic Approximation , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[56]  Piotr Indyk,et al.  Sparse Recovery Using Sparse Matrices , 2010, Proceedings of the IEEE.

[57]  Massih-Reza Amini,et al.  Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization , 2009, NIPS.

[58]  Anton van den Hengel,et al.  Is margin preserved after random projection? , 2012, ICML.

[59]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[60]  Genady Grabarnik,et al.  Sparse Modeling: Theory, Algorithms, and Applications , 2014 .

[61]  Franz Pernkopf,et al.  Sparse nonnegative matrix factorization with ℓ0-constraints , 2012, Neurocomputing.

[62]  Ali H. Shoeb,et al.  Application of Machine Learning To Epileptic Seizure Detection , 2010, ICML.

[63]  Nikos D. Sidiropoulos,et al.  ParCube: Sparse Parallelizable Tensor Decompositions , 2012, ECML/PKDD.

[64]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[65]  Sam Shah,et al.  Root cause detection in a service-oriented architecture , 2013, SIGMETRICS '13.

[66]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[67]  Vatsal Sharan,et al.  Orthogonalized ALS: A Theoretically Principled Tensor Decomposition Algorithm for Practical Use , 2017, ICML.

[68]  Ata Kabán,et al.  Compressed fisher linear discriminant analysis: classification of randomly projected data , 2010, KDD.

[69]  Yuan Gao,et al.  Improving molecular cancer class discovery through sparse non-negative matrix factorization , 2005 .

[70]  Esther Rodríguez-Villegas,et al.  Compressive sensing scalp EEG signals: implementations and practical performance , 2011, Medical & Biological Engineering & Computing.

[71]  Victoria Stodden,et al.  When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts? , 2003, NIPS.

[72]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[73]  Ata Kabán New Bounds on Compressive Linear Least Squares Regression , 2014, AISTATS.

[74]  Ting Sun,et al.  Single-pixel imaging via compressive sampling , 2008, IEEE Signal Process. Mag..

[75]  James E. Fowler,et al.  Compressive-Projection Principal Component Analysis , 2009, IEEE Transactions on Image Processing.

[76]  Sanjeev Arora,et al.  Learning Topic Models -- Going beyond SVD , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[77]  Samuel Madden,et al.  MacroBase: Prioritizing Attention in Fast Data , 2016, SIGMOD Conference.

[78]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..