论文信息 - Sketching for Large-Scale Learning of Mixture Models. (Apprentissage de modèles de mélange à large échelle par Sketching)

Sketching for Large-Scale Learning of Mixture Models. (Apprentissage de modèles de mélange à large échelle par Sketching)

Learning parameters from voluminous data can be prohibitive in terms of memory and computational requirements. We propose a "compressive learning" framework where we first sketch the data by computing random generalized moments of the underlying probability distribution, then estimate mixture model parameters from the sketch using an iterative algorithm analogous to greedy sparse signal recovery. We exemplify our framework with the sketched estimation of Gaussian Mixture Models (GMMs). We experimentally show that our approach yields results comparable to the classical Expectation-Maximization (EM) technique while requiring significantly less memory and fewer computations when the number of database elements is large. We report large-scale experiments in speaker verification, where our approach makes it possible to fully exploit a corpus of 1000 hours of speech signal to learn a universal background model at scales computationally inaccessible to EM.

[1] Charles L. Lawson,et al. Solving least squares problems , 1976, Classics in applied mathematics.

[2] Didier Henrion,et al. Exact Solutions to Super Resolution on Semi-Algebraic Domains in Higher Dimensions , 2015, IEEE Transactions on Information Theory.

[3] Peter Ahrendt,et al. The Multivariate Gaussian Probability Distribution , 2005 .

[4] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.

[5] J. Nolan,et al. Modeling financial data with stable distributions , 2003 .

[6] Andrew Zisserman,et al. Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7] Michael B. Cohen,et al. Dimensionality Reduction for k-Means Clustering and Low Rank Approximation , 2014, STOC.

[8] Rémi Gribonval,et al. LocOMP: algorithme localement orthogonal pour l'approximation parcimonieuse rapide de signaux longs sur des dictionnaires locaux , 2009 .

[9] Xavier Rodet,et al. Analysis of sound signals with high resolution matching pursuit , 1996, Proceedings of Third International Symposium on Time-Frequency and Time-Scale Analysis (TFTS-96).

[10] David L Donoho,et al. Compressed sensing , 2006, IEEE Transactions on Information Theory.

[11] S. Canu,et al. Training Invariant Support Vector Machines using Selective Sampling , 2005 .

[12] B. Chandra,et al. Fast learning in Deep Neural Networks , 2016, Neurocomputing.

[13] Diego P. Ruiz,et al. Finite mixture of alpha Stable distributions , 2007 .

[14] D. Donoho. 50 Years of Data Science , 2017 .

[15] Diego P. Ruiz,et al. Modelling with mixture of symmetric stable distributions using Gibbs sampling , 2010, Signal Process..

[16] Krzysztof Choromanski,et al. The Unreasonable Effectiveness of Random Orthogonal Embeddings , 2017 .

[17] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[18] Rémi Gribonval,et al. Compressive K-means , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19] Venkat Chandrasekaran,et al. Recovery of Sparse Probability Measures via Convex Programming , 2012, NIPS.

[20] Rémi Munos,et al. Compressed Least-Squares Regression , 2009, NIPS.

[21] Le Song,et al. A la Carte - Learning Fast Kernels , 2014, AISTATS.

[22] Deyu Meng,et al. FastMMD: Ensemble of Circular Discrepancy for Efficient Two-Sample Test , 2014, Neural Computation.

[23] Stéphane Mallat,et al. Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[24] Shuicheng Yan,et al. SIFT-Bag kernel for video event analysis , 2008, ACM Multimedia.

[25] Dmitriy Fradkin,et al. Experiments with random projections for machine learning , 2003, KDD '03.

[26] Nir Ailon,et al. Streaming k-means approximation , 2009, NIPS.

[27] Anshumali Shrivastava,et al. Scalable and Sustainable Deep Learning via Randomized Hashing , 2016, KDD.

[28] Rémi Gribonval,et al. Compressive Statistical Learning with Random Feature Moments , 2017, Mathematical Statistics and Learning.

[29] E. L. Pennec,et al. Adaptive Dantzig density estimation , 2009, 0905.0884.

[30] Sudipto Guha,et al. Dynamic multidimensional histograms , 2002, SIGMOD '02.

[31] Mike E. Davies,et al. Sampling Theorems for Signals From the Union of Finite-Dimensional Linear Subspaces , 2009, IEEE Transactions on Information Theory.

[32] Michael Elad,et al. The Cosparse Analysis Model and Algorithms , 2011, ArXiv.

[33] Sanjoy Dasgupta,et al. Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[34] Alex Smola,et al. Kernel methods in machine learning , 2007, math/0701907.

[35] Matthias W. Seeger,et al. Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[36] Andreas Krause,et al. Scalable Training of Mixture Models via Coresets , 2011, NIPS.

[37] John P. Nolan,et al. Multivariate elliptically contoured stable distributions: theory and estimation , 2013, Computational Statistics.

[38] Bernhard Schölkopf,et al. Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[39] Suresh Venkatasubramanian,et al. Comparing distributions and shapes using the kernel distance , 2010, SoCG '11.

[40] Constantine Kotropoulos,et al. Symmetric α-stable sparse linear regression for musical audio denoising , 2013, 2013 8th International Symposium on Image and Signal Processing and Analysis (ISPA).

[41] Benjamin Recht,et al. The alternating descent conditional gradient method for sparse inverse problems , 2015, 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[42] Jon M. Kleinberg,et al. On learning mixtures of heavy-tailed distributions , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[43] Guillermo Sapiro,et al. Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy? , 2015, IEEE Transactions on Signal Processing.

[44] Nicolas Keriven. Sketching for Large-Scale Learning of Mixture Models. (Apprentissage de modèles de mélange à large échelle par Sketching) , 2017 .

[45] Michael I. Jordan,et al. Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[46] Gonzalo Mateos,et al. Modeling and Optimization for Big Data Analytics: (Statistical) learning tools for our era of data deluge , 2014, IEEE Signal Processing Magazine.

[47] Emmanuel J. Candès,et al. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[48] J. McCulloch,et al. Estimation of stable spectral measures , 2001 .

[49] Emmanuel J. Candès,et al. Tight Oracle Inequalities for Low-Rank Matrix Recovery From a Minimal Number of Noisy Random Measurements , 2011, IEEE Transactions on Information Theory.

[50] Huan Liu,et al. Feature Selection for Classification: A Review , 2014, Data Classification: Algorithms and Applications.

[51] N. Aronszajn. Theory of Reproducing Kernels. , 1950 .

[52] Sariel Har-Peled,et al. Coresets for $k$-Means and $k$-Median Clustering and their Applications , 2018, STOC 2004.

[53] Graham Cormode,et al. An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[54] Bernhard Schölkopf,et al. A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[55] R. DeVore,et al. A Simple Proof of the Restricted Isometry Property for Random Matrices , 2008 .

[56] Bernhard Schölkopf,et al. Kernel Mean Estimation and Stein Effect , 2013, ICML.

[57] Y. C. Pati,et al. Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[58] Mustapha Lebbah,et al. State-of-the-art on clustering data streams , 2016 .

[59] Trac D. Tran,et al. Fast and Efficient Compressive Sensing Using Structurally Random Matrices , 2011, IEEE Transactions on Signal Processing.

[60] Felipe Cucker,et al. On the mathematical foundations of learning , 2001 .

[61] Huan Liu,et al. Feature Selection for Clustering: A Review , 2018, Data Clustering: Algorithms and Applications.

[62] A. Tsybakov,et al. SPADES AND MIXTURE MODELS , 2009, 0901.2044.

[63] Tom Fischer,et al. Existence, uniqueness, and minimality of the Jordan measure decomposition , 2012 .

[64] Zoltán Szabó,et al. Optimal Rates for Random Fourier Features , 2015, NIPS.

[65] A. Müller. Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[66] Christian Sohler,et al. A fast k-means implementation using coresets , 2006, SCG '06.

[67] Pierre Vandergheynst,et al. Accelerated spectral clustering using graph filtering of random signals , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[68] Divesh Srivastava,et al. Diamond in the rough: finding Hierarchical Heavy Hitters in multi-dimensional data , 2004, SIGMOD '04.

[69] Massimo Fornasier,et al. Compressive Sensing , 2015, Handbook of Mathematical Methods in Imaging.

[70] G. Wahba,et al. A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[71] Eric Moulines,et al. On‐line expectation–maximization algorithm for latent data models , 2007, ArXiv.

[72] Rémi Gribonval,et al. Compressed sensing and best approximation from unions of subspaces: Beyond dictionaries , 2013, 21st European Signal Processing Conference (EUSIPCO 2013).

[73] Kenji Fukumizu,et al. Recovering Distributions from Gaussian RKHS Embeddings , 2014, AISTATS.

[74] W. Rudin,et al. Fourier Analysis on Groups. , 1965 .

[75] William M. Rand,et al. Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[76] Hans-Peter Kriegel,et al. Integrating structured biological data by Kernel Maximum Mean Discrepancy , 2006, ISMB.

[77] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[78] Patrick Pérez,et al. Compressive Gaussian Mixture estimation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[79] J. Florens,et al. GENERALIZATION OF GMM TO A CONTINUUM OF MOMENT CONDITIONS , 2000, Econometric Theory.

[80] Anthony Bourrier,et al. Compressed sensing and dimensionality reduction for unsupervised learning. (Échantillonnage compressé et réduction de dimension pour l'apprentissage non supervisé) , 2014 .

[81] Le Song,et al. Tailoring density estimation via reproducing kernel moment matching , 2008, ICML '08.

[82] Radu Horaud,et al. High-dimensional regression with gaussian mixtures and partially-latent response variables , 2013, Statistics and Computing.

[83] R. Tibshirani,et al. Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[84] Yohann de Castro,et al. Exact Reconstruction using Beurling Minimal Extrapolation , 2011, 1103.4951.

[85] Robert D. Nowak,et al. Distilled Sensing: Adaptive Sampling for Sparse Detection and Estimation , 2010, IEEE Transactions on Information Theory.

[86] Gunnar Rätsch,et al. Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[87] Santosh S. Vempala,et al. A spectral algorithm for learning mixture models , 2004, J. Comput. Syst. Sci..

[88] Yonina C. Eldar,et al. Compressed Sensing with Coherent and Redundant Dictionaries , 2010, ArXiv.

[89] Mike E. Davies,et al. Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[90] Christophe Andrieu,et al. Online expectation-maximization type algorithms for parameter estimation in general state space models , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[91] Peng Zhang,et al. Matrix Multiplication on High-Density Multi-GPU Architectures: Theoretical and Experimental Investigations , 2015, ISC.

[92] Gabriel Peyré,et al. Exact Support Recovery for Sparse Spikes Deconvolution , 2013, Foundations of Computational Mathematics.

[93] Hassan Mansour,et al. Representation and Coding of Signal Geometry , 2015, ArXiv.

[94] Sivaraman Balakrishnan,et al. Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.

[95] Rong Jin,et al. Efficient Kernel Clustering Using Random Fourier Features , 2012, 2012 IEEE 12th International Conference on Data Mining.

[96] Emmanuel J. Candès,et al. Super-Resolution from Noisy Data , 2012, Journal of Fourier Analysis and Applications.

[97] David G. Lowe,et al. Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[98] Pierre Hansen,et al. NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[99] Mikhail Belkin,et al. Toward Learning Gaussian Mixtures with Arbitrary Separation , 2010, COLT.

[100] Peter Harremoës,et al. Refinements of Pinsker's inequality , 2003, IEEE Trans. Inf. Theory.

[101] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[102] Michael Langberg,et al. A unified framework for approximating and clustering data , 2011, STOC.

[103] Simon A. Broda,et al. Stable Mixture GARCH Models , 2011 .

[104] Sanjiv Kumar,et al. Spherical Random Features for Polynomial Kernels , 2015, NIPS.

[105] Rémi Gribonval,et al. Stable recovery of low-dimensional cones in Hilbert spaces: One RIP to rule them all , 2015, Applied and Computational Harmonic Analysis.

[106] Florent Krzakala,et al. Random projections through multiple optical scattering: Approximating Kernels at the speed of light , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[107] James C. Robinson. Dimensions, embeddings, and attractors , 2010 .

[108] Bernhard Schölkopf,et al. Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions , 2009, NIPS.

[109] K. Fukumizu,et al. Learning via Hilbert Space Embedding of Distributions , 2007 .

[110] Dimitris Achlioptas,et al. Database-friendly random projections , 2001, PODS.

[111] H. Rauhut. On the Impossibility of Uniform Sparse Reconstruction using Greedy Methods , 2007 .

[112] Sudipto Guha,et al. Clustering Data Streams , 2000, FOCS.

[113] Bernhard Schölkopf,et al. A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[114] Vadym Omelchenko,et al. Parameter estimation of sub-Gaussian stable distributions , 2015, Kybernetika.

[115] Barnabás Póczos,et al. Deep Mean Maps , 2015, ArXiv.

[116] P. Tseng. Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[117] Tatsuya Harada,et al. Kernel Approximation via Empirical Orthogonal Decomposition for Unsupervised Feature Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[118] R. Cooke. Real and Complex Analysis , 2011 .

[119] Sanjiv Kumar,et al. Orthogonal Random Features , 2016, NIPS.

[120] A. Feuerverger,et al. The Empirical Characteristic Function and Its Applications , 1977 .

[121] S. Godsill,et al. Bayesian inference for time series with heavy-tailed symmetric α-stable noise processes , 1999 .

[122] Barnabás Póczos,et al. On the Decreasing Power of Kernel and Distance Based Nonparametric Hypothesis Tests in High Dimensions , 2014, AAAI.

[123] G. Giannakis,et al. Modeling And Optimization For Big Data Analytics , 2014 .

[124] Jean-Pierre Florens,et al. Efficient GMM Estimation Using the Empirical Characteristic Function , 2002 .

[125] Kaj Madsen,et al. Methods for Non-Linear Least Squares Problems , 1999 .

[126] Alfred O. Hero,et al. Optimal Two-Stage Search for Sparse Targets Using Convex Criteria , 2008, IEEE Transactions on Signal Processing.

[127] AI Koan,et al. Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[128] Jorge Nocedal,et al. A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[129] Dinghai Xu,et al. Continuous Empirical Characteristic Function Estimation of Mixtures of Normal Parameters , 2010 .

[130] Alvin F. Martin,et al. The NIST speaker recognition evaluation program , 2005 .

[131] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[132] Holger Rauhut,et al. A Mathematical Introduction to Compressive Sensing , 2013, Applied and Numerical Harmonic Analysis.

[133] Wotao Yin,et al. A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..

[134] Jean-Pierre Florens,et al. ON THE ASYMPTOTIC EFFICIENCY OF GMM , 2013, Econometric Theory.

[135] A. Robert Calderbank,et al. Projections designs for compressive classification , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[136] Bernhard Schölkopf,et al. Computing functions of random variables via reproducing kernel Hilbert space representations , 2015, Statistics and Computing.

[137] Pablo A. Parrilo,et al. The Convex Geometry of Linear Inverse Problems , 2010, Foundations of Computational Mathematics.

[138] Inderjit S. Dhillon,et al. Orthogonal Matching Pursuit with Replacement , 2011, NIPS.

[139] R. Calderbank,et al. Compressed Learning : Universal Sparse Dimensionality Reduction and Learning in the Measurement Domain , 2009 .

[140] A. Robert Calderbank,et al. Compressive classification , 2013, 2013 IEEE International Symposium on Information Theory.

[141] Douglas A. Reynolds,et al. Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[142] Thomas Blumensath,et al. Sampling and Reconstructing Signals From a Union of Linear Subspaces , 2009, IEEE Transactions on Information Theory.

[143] Luc Le Magoarou. Matrices efficientes pour le traitement du signal et l'apprentissage automatique. (Efficient matrices for signal processing and machine learning) , 2016 .

[144] Emmanuel J. Candès,et al. Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[145] Mikhail Belkin,et al. Polynomial Learning of Distribution Families , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[146] Patrick Pérez,et al. Fundamental Performance Limits for Ideal Decoders in High-Dimensional Linear Inverse Problems , 2013, IEEE Transactions on Information Theory.

[147] David P. Woodruff,et al. How to Fake Multiply by a Gaussian Matrix , 2016, ICML.

[148] Pablo A. Parrilo,et al. Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[149] Barnabás Póczos,et al. Linear-Time Learning on Distributions with Approximate Kernel Embeddings , 2015, AAAI.

[150] Kien C. Tran. Estimating mixtures of normal distributions via empirical characteristic function , 1998 .

[151] Fatih Murat Porikli,et al. Compressive Clustering of High-Dimensional Data , 2012, 2012 11th International Conference on Machine Learning and Applications.

[152] David A. Cohn,et al. Active Learning with Statistical Models , 1996, NIPS.

[153] Anthony Widjaja,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[154] Jason Weston,et al. Mismatch string kernels for discriminative protein classification , 2004, Bioinform..

[155] Dimitris Achlioptas,et al. On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[156] Andrea Vedaldi,et al. Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[157] Le Song,et al. A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[158] Mike E. Davies,et al. Gradient pursuit for non-linear sparse signal modelling , 2008, 2008 16th European Signal Processing Conference.

[159] Michael I. Jordan,et al. On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[160] Cristian Sminchisescu,et al. Efficient Match Kernel between Sets of Features for Visual Recognition , 2009, NIPS.

[161] Rémi Gribonval,et al. Linear embeddings of low-dimensional subsets of a Hilbert space to Rm , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[162] Alexander J. Smola,et al. Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.

[163] W. Newey,et al. Large sample estimation and hypothesis testing , 1986 .

[164] Adel Mohammadpour,et al. Mixture of Skewed $\alpha$-stable Distributions , 2011 .

[165] S. V. N. Vishwanathan,et al. Graph kernels , 2007 .

[166] Arthur Gretton,et al. Interpretable Distribution Features with Maximum Testing Power , 2016, NIPS.

[167] David P. Woodruff,et al. Coresets and sketches for high dimensional subspace approximation problems , 2010, SODA '10.

[168] John Wright,et al. When Are Nonconvex Problems Not Scary? , 2015, ArXiv.

[169] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[170] Barnabás Póczos,et al. Bayesian Nonparametric Kernel-Learning , 2015, AISTATS.

[171] Bharath K. Sriperumbudur. Mixture density estimation via Hilbert space embedding of measures , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[172] Nello Cristianini,et al. Classification using String Kernels , 2000 .

[173] Deanna Needell,et al. CoSaMP: Iterative signal recovery from incomplete and inaccurate samples , 2008, ArXiv.

[174] Emmanuel J. Candès,et al. Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[175] Joaquín Muñoz-García,et al. A test for the two-sample problem based on empirical characteristic functions , 2008, Comput. Stat. Data Anal..

[176] E. Candès,et al. Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[177] A. Robert Calderbank,et al. Nonlinear Information-Theoretic Compressive Measurement Design , 2014, ICML.

[178] Christopher R. Taber,et al. Generalized Method of Moments , 2020, Time Series Analysis.

[179] Andreas Krause,et al. Training Mixture Models at Scale via Coresets , 2017 .

[180] Roberto Casarin,et al. Bayesian Inference for Mixtures of Stable Distributions , 2004 .

[181] Hyunjoong Kim,et al. Functional Analysis I , 2017 .

[182] Andrew Gordon Wilson,et al. Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[183] Sham M. Kakade,et al. Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[184] Krzysztof Choromanski,et al. Recycling Randomness with Structure for Sublinear time Kernel Expansions , 2016, ICML.

[185] E. Candès. The restricted isometry property and its implications for compressed sensing , 2008 .

[186] O. Cappé,et al. On‐line expectation–maximization algorithm for latent data models , 2009 .

[187] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[188] Mikhail Belkin,et al. The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures , 2013, COLT.

[189] K. Bredies,et al. Inverse problems in spaces of measures , 2013 .

[190] Jeff G. Schneider,et al. On the Error of Random Fourier Features , 2015, UAI.

[191] Stephen Becker,et al. Randomized Clustered Nystrom for Large-Scale Kernel Machines , 2016, AAAI.

[192] Frank D. Wood,et al. Super-Sampling with a Reservoir , 2016, UAI.

[193] John C. Duchi,et al. Learning Kernels with Random Features , 2016, NIPS.

[194] H. Landau. Moments in mathematics , 1987 .

[195] Pierre Vandergheynst,et al. Compressive Spectral Clustering , 2016, ICML.

[196] Francis R. Bach,et al. On the Equivalence between Quadrature Rules and Random Features , 2015, ArXiv.

[197] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[198] Jeff M. Phillips,et al. Streaming Kernel Principal Component Analysis , 2015, AISTATS.

[199] A. Feuerverger,et al. On Some Fourier Methods for Inference , 1981 .

[200] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .

[201] Emmanuel J. Candès,et al. The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[202] T. Blumensath,et al. Iterative Thresholding for Sparse Approximations , 2008 .

[203] R. DeVore,et al. Compressed sensing and best k-term approximation , 2008 .

[204] D. Marquardt. An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[205] Marios Hadjieleftheriou,et al. Methods for finding frequent items in data streams , 2010, The VLDB Journal.

[206] Bernhard Schölkopf,et al. Kernel Measures of Conditional Dependence , 2007, NIPS.

[207] A. Berlinet,et al. Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[208] Jean-Yves Tourneret,et al. Toward Fast Transform Learning , 2014, International Journal of Computer Vision.

[209] S. Muthukrishnan,et al. How to Summarize the Universe: Dynamic Maintenance of Quantiles , 2002, VLDB.

[210] Larry P. Heck,et al. MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker Recognition Research , 2013 .

[211] Huaiyu Zhu. On Information and Sufficiency , 1997 .

[212] Arthur Gretton,et al. Fast Two-Sample Testing with Analytic Representations of Probability Measures , 2015, NIPS.

[213] Bernhard Schölkopf,et al. Learning from Distributions via Support Measure Machines , 2012, NIPS.