Sketching Datasets for Large-Scale Learning (long version)

This article considers "sketched learning," or "compressive learning," an approach to large-scale machine learning where datasets are massively compressed before learning (e.g., clustering, classification, or regression) is performed. In particular, a "sketch" is first constructed by computing carefully chosen nonlinear random features (e.g., random Fourier features) and averaging them over the whole dataset. Parameters are then learned from the sketch, without access to the original dataset. This article surveys the current state-of-the-art in sketched learning, including the main concepts and algorithms, their connections with established signal-processing methods, existing theoretical guarantees-on both information preservation and privacy preservation, and important open problems.

[1]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[2]  Mike E. Davies,et al.  Compressive Learning for Semi-Parametric Models , 2019, ArXiv.

[3]  Sanjiv Kumar,et al.  Orthogonal Random Features , 2016, NIPS.

[4]  Jean-François Aujol,et al.  The basins of attraction of the global minimizers of the non-convex sparse spikes estimation problem , 2018, ArXiv.

[5]  Laurent Jacques,et al.  Compressive Classification (Machine Learning without learning) , 2018, ArXiv.

[6]  Philip Schniter,et al.  Sketched Clustering via Hybrid Approximate Message Passing , 2019, IEEE Transactions on Signal Processing.

[7]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[8]  Rémi Gribonval,et al.  Compressive K-means , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Benjamin Recht,et al.  The alternating descent conditional gradient method for sparse inverse problems , 2015, 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[10]  Laurent Jacques,et al.  Breaking the waves: asymmetric random periodic features for low-bitrate kernel machines , 2020, ArXiv.

[11]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[12]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[13]  Mike E. Davies,et al.  Compressive Independent Component Analysis , 2019, 2019 27th European Signal Processing Conference (EUSIPCO).

[14]  Emmanuel J. Candès,et al.  Towards a Mathematical Theory of Super‐resolution , 2012, ArXiv.

[15]  Jeffrey A. Fessler,et al.  Optimization Methods for Magnetic Resonance Image Reconstruction: Key Models and Optimization Algorithms , 2020, IEEE Signal Processing Magazine.

[16]  Jian Li,et al.  Advances in Radar Systems for Modern Civilian and Commercial Applications: Part 1 [From the Guest Editors] , 2019, IEEE Signal Process. Mag..

[17]  Francis Bach,et al.  On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.

[18]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[19]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[20]  Yudong Chen,et al.  Harnessing Structures in Big Data via Guaranteed Low-Rank Matrix Estimation: Recent Theory and Fast Algorithms via Convex and Nonconvex Optimization , 2018, IEEE Signal Processing Magazine.

[21]  David S. Rosenberg,et al.  Multiview point cloud kernels for semisupervised learning [Lecture Notes] , 2009, IEEE Signal Processing Magazine.

[22]  Inderjit S. Dhillon,et al.  Orthogonal Matching Pursuit with Replacement , 2011, NIPS.

[23]  Michael Elad,et al.  Dictionaries for Sparse Representation Modeling , 2010, Proceedings of the IEEE.

[24]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[25]  Robert Jenssen Entropy-Relevant Dimensions in the Kernel Feature Space: Cluster-Capturing Dimensionality Reduction , 2013, IEEE Signal Processing Magazine.

[26]  Enrico Magli,et al.  Compressed Sensing for Privacy-Preserving Data Processing , 2018, SpringerBriefs in Electrical and Computer Engineering.

[27]  Kaare Brandt Petersen,et al.  Kernel Multivariate Analysis Framework for Supervised Subspace Learning: A Tutorial on Linear and Kernel Multivariate Methods , 2013, IEEE Signal Processing Magazine.

[28]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[29]  Gabriel Peyré,et al.  A Dual Certificates Analysis of Compressive Off-the-Grid Recovery , 2018, ArXiv.

[30]  K. Bredies,et al.  Inverse problems in spaces of measures , 2013 .

[31]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[32]  Dao-Hong Xiang,et al.  Parzen windows for multi-class classification , 2008, J. Complex..

[33]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[34]  Rémi Gribonval,et al.  Differentially Private Compressive K-means , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[35]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[36]  Kush R. Varshney,et al.  Sparsity-Driven Synthetic Aperture Radar Imaging: Reconstruction, autofocusing, moving targets, and compressed sensing , 2014, IEEE Signal Processing Magazine.

[37]  Partha Niyogi,et al.  Multiview point cloud kernels for semisupervised learning , 2009 .

[38]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[39]  Antoine Liutkus,et al.  Blind Source Separation Using Mixtures of Alpha-Stable Distributions , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[40]  Laurent Jacques,et al.  Quantized Compressive K-Means , 2018, IEEE Signal Processing Letters.

[41]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[42]  H. Vincent Poor,et al.  An Introduction to Signal Detection and Estimation , 1994, Springer Texts in Electrical Engineering.

[43]  Rémi Gribonval,et al.  Compressive Statistical Learning with Random Feature Moments , 2017, Mathematical Statistics and Learning.

[44]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[45]  Nicolas Keriven Sketching for Large-Scale Learning of Mixture Models. (Apprentissage de modèles de mélange à large échelle par Sketching) , 2017 .

[46]  Rémi Gribonval,et al.  Large-Scale High-Dimensional Clustering with Fast Sketching , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[47]  Anand D. Sarwate,et al.  Signal Processing and Machine Learning with Differential Privacy: Algorithms and Challenges for Continuous Data , 2013, IEEE Signal Processing Magazine.

[48]  O. Bousquet,et al.  Kernel methods and their potential use in signal processing , 2004, IEEE Signal Processing Magazine.

[49]  Alastair R. Hall,et al.  Generalized Method of Moments , 2005 .

[50]  Emmanuel Soubies,et al.  The sliding Frank–Wolfe algorithm and its application to super-resolution microscopy , 2018, Inverse Problems.

[51]  M. Lustig,et al.  Compressed Sensing MRI , 2008, IEEE Signal Processing Magazine.

[52]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[53]  Hassan Mansour,et al.  Representation and Coding of Signal Geometry , 2015, ArXiv.

[54]  Francis R. Bach,et al.  On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..

[55]  Peter J. Haas,et al.  Synopses for Massive Data , 2012 .

[56]  Petros Boufounos,et al.  Privacy-preserving nearest neighbor methods: comparing signals without revealing them , 2013, IEEE Signal Processing Magazine.

[57]  Ninghui Li,et al.  Differentially private grids for geospatial data , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[58]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[59]  Holger Rauhut,et al.  A Mathematical Introduction to Compressive Sensing , 2013, Applied and Numerical Harmonic Analysis.

[60]  P. Vandergheynst,et al.  Compressed sensing imaging techniques for radio interferometry , 2008, 0812.4933.

[61]  Sudipto Guha,et al.  Dynamic multidimensional histograms , 2002, SIGMOD '02.

[62]  Stephen P. Boyd,et al.  Generalized Low Rank Models , 2014, Found. Trends Mach. Learn..

[63]  Gabriel Peyr'e,et al.  The geometry of off-the-grid compressed sensing , 2020 .