Large-Scale Kernel-Based Feature Extraction via Low-Rank Subspace Tracking on a Budget

Kernel-based methods enjoy powerful generalization capabilities in learning a variety of pattern recognition tasks. When such methods are provided with sufficient training data, broadly applicable classes of nonlinear functions can be approximated with desired accuracy. Nevertheless, inherent to the nonparametric nature of kernel-based estimators are computational and memory requirements that become prohibitive with large-scale datasets. In response to this formidable challenge, this paper puts forward a low-rank, kernel-based, feature extraction approach that is particularly tailored for online operation. A novel generative model is introduced to approximate high-dimensional (possibly infinite) features via a low-rank nonlinear subspace, the learning of which lends itself to a kernel function approximation. Offline and online solvers are developed for the subspace learning task, along with affordable versions, in which the number of stored data vectors is confined to a predefined budget. Analytical results provide performance bounds on how well the kernel matrix as well as kernel-based classification and regression tasks can be approximated by leveraging budgeted online subspace learning and feature extraction schemes. Tests on synthetic and real datasets demonstrate and benchmark the efficiency of the proposed method for dynamic nonlinear subspace tracking as well as online classification and regressions tasks.

[1]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[2]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[3]  Paul Honeine,et al.  Online Kernel Principal Component Analysis: A Reduced-Order Model , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  Rémi Gribonval,et al.  Sample Complexity of Dictionary Learning and Other Matrix Factorizations , 2013, IEEE Transactions on Information Theory.

[6]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[7]  Didier Stricker,et al.  Introducing a New Benchmarked Dataset for Activity Monitoring , 2012, 2012 16th International Symposium on Wearable Computers.

[8]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[9]  Slobodan Vucetic,et al.  Online training on a budget of support vector machines using twin prototypes , 2010 .

[10]  Ameet Talwalkar,et al.  On the Impact of Kernel Approximation on Learning Accuracy , 2010, AISTATS.

[11]  Morteza Mardani,et al.  Decentralized Sparsity-Regularized Rank Minimization: Algorithms and Applications , 2012, IEEE Transactions on Signal Processing.

[12]  Weifeng Liu,et al.  Fixed-budget kernel recursive least-squares , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Steven Kay,et al.  Fundamentals Of Statistical Signal Processing , 2001 .

[14]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[15]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[16]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[17]  Ameet Talwalkar,et al.  Ensemble Nystrom Method , 2009, NIPS.

[18]  Georgios B. Giannakis,et al.  Scalable kernel-based learning via low-rank approximation of lifted data , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[19]  Bernhard Schölkopf,et al.  Randomized Nonlinear Component Analysis , 2014, ICML.

[20]  Massimiliano Pontil,et al.  $K$ -Dimensional Coding Schemes in Hilbert Spaces , 2010, IEEE Transactions on Information Theory.

[21]  Yoram Singer,et al.  The Forgetron: A Kernel-Based Perceptron on a Budget , 2008, SIAM J. Comput..

[22]  Shie Mannor,et al.  The Sample Complexity of Dictionary Learning , 2010, COLT.

[23]  Georgios B. Giannakis,et al.  Memory efficient low-rank non-linear subspace tracking , 2017, 2017 IEEE 7th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[24]  Steven C. H. Hoi,et al.  Large Scale Online Kernel Learning , 2016, J. Mach. Learn. Res..

[25]  Morteza Mardani,et al.  Dynamic Anomalography: Tracking Network Anomalies Via Sparsity and Low Rank , 2012, IEEE Journal of Selected Topics in Signal Processing.

[26]  Bernhard Schölkopf,et al.  Iterative kernel principal component analysis for image modeling , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  S. V. N. Vishwanathan,et al.  Fast Iterative Kernel Principal Component Analysis , 2007, J. Mach. Learn. Res..

[28]  Yuh-Jye Lee,et al.  RSVM: Reduced Support Vector Machines , 2001, SDM.

[29]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[30]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[31]  Rong Jin,et al.  Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison , 2012, NIPS.

[32]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[33]  Ivor W. Tsang,et al.  Improved Nyström low-rank approximation and error analysis , 2008, ICML '08.

[34]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[35]  Georgios B. Giannakis,et al.  Online Censoring for Large-Scale Regressions with Application to Streaming Big Data , 2015, IEEE Transactions on Signal Processing.

[36]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[37]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[38]  Yoram Singer,et al.  Support Vector Machines on a Budget , 2006, NIPS.

[39]  Inderjit S. Dhillon,et al.  Memory Efficient Kernel Approximation , 2014, ICML.

[40]  Zhuang Wang,et al.  Scaling Up Kernel SVM on Limited Resources: A Low-Rank Linearization Approach , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[41]  Koby Crammer,et al.  Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training , 2012, J. Mach. Learn. Res..

[42]  Barbara Caputo,et al.  The projectron: a bounded kernel-based Perceptron , 2008, ICML '08.

[43]  Michael Elad,et al.  Double Sparsity: Learning Sparse Dictionaries for Sparse Signal Approximation , 2010, IEEE Transactions on Signal Processing.

[44]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[45]  Claudio Gentile,et al.  Tracking the best hyperplane with a simple budget Perceptron , 2006, Machine Learning.

[46]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[47]  Le Song,et al.  Scalable Kernel Methods via Doubly Stochastic Gradients , 2014, NIPS.

[48]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[49]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[50]  Francis R. Bach,et al.  Sharp analysis of low-rank kernel matrix approximations , 2012, COLT.

[51]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[52]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[53]  Slobodan Vucetic,et al.  Online Passive-Aggressive Algorithms on a Budget , 2010, AISTATS.

[54]  Zhihua Zhang,et al.  Improving the modified nyström method using spectral shifting , 2014, KDD.