Adapting Kernel Representations Online Using Submodular Maximization

Kernel representations provide a nonlinear representation, through similarities to prototypes, but require only simple linear learning algorithms given those prototypes. In a continual learning setting, with a constant stream of observations, it is critical to have an efficient mechanism for sub-selecting prototypes amongst observations. In this work, we develop an approximately submodular criterion for this setting, and an efficient online greedy submodular maximization algorithm for optimizing the criterion. We extend streaming submodular maximization algorithms to continual learning, by removing the need for multiple passes—which is infeasible—and instead introducing the idea of coverage time. We propose a general block-diagonal approximation for the greedy update with our criterion, that enables updates linear in the number of prototypes. We empirically demonstrate the effectiveness of this approximation, in terms of approximation quality, significant runtime improvements, and effective prediction performance.

[1]  Lorenzo Rosasco,et al.  Less is More: Nyström Computational Regularization , 2015, NIPS.

[2]  Le Song,et al.  Scalable Kernel Methods via Doubly Stochastic Gradients , 2014, NIPS.

[3]  Michael I. Jordan,et al.  Predictive low-rank decomposition for kernel methods , 2005, ICML.

[4]  Dale Schuurmans,et al.  implicit Online Learning with Kernels , 2006, NIPS.

[5]  Weifeng Liu,et al.  Fixed-budget kernel recursive least-squares , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[7]  Suvrit Sra,et al.  Fast DPP Sampling for Nystrom with Application to Kernel Methods , 2016, ICML.

[8]  Andreas Argyriou,et al.  A Unifying View of Representer Theorems , 2014, ICML.

[9]  Tara N. Sainath,et al.  Kernel methods match Deep Neural Networks on TIMIT , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Ignacio Santamaria,et al.  A comparative study of kernel adaptive filtering algorithms , 2013, 2013 IEEE Digital Signal Processing and Signal Processing Education Meeting (DSP/SPE).

[11]  Abhimanyu Das,et al.  Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection , 2011, ICML.

[12]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[13]  Andreas Krause,et al.  Streaming submodular maximization: massive data summarization on the fly , 2014, KDD.

[14]  Paul Honeine,et al.  Online Prediction of Time Series Data With Kernels , 2009, IEEE Transactions on Signal Processing.

[15]  Max A. Little,et al.  Accurate Telemonitoring of Parkinson's Disease Progression by Noninvasive Speech Tests , 2009, IEEE Transactions on Biomedical Engineering.

[16]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[17]  Michael W. Mahoney,et al.  Revisiting the Nystrom Method for Improved Large-scale Machine Learning , 2013, J. Mach. Learn. Res..

[18]  Brian Kingsbury,et al.  How to Scale Up Kernel Methods to Be As Good As Deep Neural Nets , 2014, ArXiv.

[19]  Andreas Krause,et al.  Budgeted Nonparametric Learning from Data Streams , 2010, ICML.

[20]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[21]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[22]  Byron Boots,et al.  Incremental Variational Sparse Gaussian Process Regression , 2016, NIPS.

[23]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[24]  Cordelia Schmid,et al.  Convolutional Kernel Networks , 2014, NIPS.

[25]  Inderjit S. Dhillon,et al.  Memory Efficient Kernel Approximation , 2014, ICML.

[26]  Alexander J. Smola,et al.  Step Size Adaptation in Reproducing Kernel Hilbert Space , 2006, J. Mach. Learn. Res..

[27]  Ivan Matic,et al.  Inequalities with determinants of perturbed positive matrices , 2014, 1402.3548.

[28]  Badong Chen,et al.  Quantized Kernel Recursive Least Squares Algorithm , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Shie Mannor,et al.  The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[30]  Andreas S. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .

[31]  H. B. McMahan,et al.  Robust Submodular Observation Selection , 2008 .

[32]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[33]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[34]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..