Sparse Multiresolution Representations With Adaptive Kernels

Reproducing kernel Hilbert spaces (RKHSs) are key components of non-parametric toolsused in signal processing, statistics, and machine learning. In this work, we aim to address three issues of the classical RKHS-based techniques. First, they require the RKHS to be known a priori, which is unrealistic in many applications. Furthermore, the choice of RKHS affects the shape and smoothness of the solution, thus impacting its performance. Second, RKHSs are ill-equipped to deal with heterogeneous degrees of smoothness, i.e., with functions that are smooth in some parts of their domain but vary rapidly in others. Finally, the computational complexity of evaluating the solution of these methods grows with the number of data points in the training sample, rendering these techniques infeasible for many applications. Although kernel learning, local kernel adaptation, and sparsity have been used to address these issues, many of these approaches are computationally intensive or forgo optimality guarantees. We tackle these problems by leveraging a novel integral representation of functions in RKHSs that allows for arbitrary centers and different kernels at each center. To address the complexity issue, we then write the function estimation problem as a sparse functional program that explicitly minimizes the support of the representation leading to low complexity solutions. Despite their non-convexity and infinite dimensionality, we show these problems can be solved exactly and efficiently by leveraging duality, and illustrate this new approach in simulated and real data.

[1]  David Zhang,et al.  A Survey of Sparse Representation: Algorithms and Applications , 2015, IEEE Access.

[2]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  Edoardo Amaldi,et al.  On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..

[5]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[6]  A. K. Ghosh,et al.  Kernel Discriminant Analysis Using Case-Specific Smoothing Parameters , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[7]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[8]  Liang-Tien Chia,et al.  Sparse Representation With Kernels , 2013, IEEE Transactions on Image Processing.

[9]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[10]  Alexander J. Smola,et al.  Learning the Kernel with Hyperkernels , 2005, J. Mach. Learn. Res..

[11]  Nanning Zheng,et al.  Kernel least mean square with adaptive kernel size , 2014, Neurocomputing.

[12]  Shie Mannor,et al.  Robust Logistic Regression and Classification , 2014, NIPS.

[13]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[14]  Alejandro Ribeiro,et al.  Parsimonious Online Learning with Kernels via sparse projections in function space , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Hyunjoong Kim,et al.  Functional Analysis I , 2017 .

[16]  Andrew Gordon Wilson,et al.  Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[17]  M. Rudelson,et al.  On sparse reconstruction from Fourier and Gaussian measurements , 2008 .

[18]  Chin-Teng Lin,et al.  An Automatic Method for Selecting the Parameter of the Normalized Kernel Function to Support Vector Machines , 2010, 2010 International Conference on Technologies and Applications of Artificial Intelligence.

[19]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[20]  Philippe C. Cattin,et al.  Sparse Kernel Machines for Discontinuous Registration and Nonstationary Regularization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21]  Alejandro Ribeiro,et al.  Sparse Learning of Parsimonious Reproducing Kernel Hilbert Space Models , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Lei Wang,et al.  Sample-Adaptive Multiple Kernel Learning , 2014, AAAI.

[23]  Shie Mannor,et al.  Robust Sparse Regression under Adversarial Corruption , 2013, ICML.

[24]  Harold R. Parks,et al.  A Primer of Real Analytic Functions , 1992 .

[25]  Alejandro Ribeiro,et al.  Ergodic Stochastic Optimization Algorithms for Wireless Communication and Networking , 2010, IEEE Transactions on Signal Processing.

[26]  Emmanuel J. Candès,et al.  Towards a Mathematical Theory of Super‐resolution , 2012, ArXiv.

[27]  I. Johnstone,et al.  Minimax estimation via wavelet shrinkage , 1998 .

[28]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[29]  Ilango Paramasivam,et al.  User localisation using wireless signal strength - an application for pattern classification using fuzzy decision tree , 2016, Int. J. Internet Protoc. Technol..

[30]  Yonina C. Eldar,et al.  Strong Duality of Sparse Functional Optimization , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[32]  Le Song,et al.  A la Carte - Learning Fast Kernels , 2014, AISTATS.

[33]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[34]  R. Rockafellar,et al.  Integral functionals, normal integrands and measurable selections , 1976 .

[35]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[36]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[37]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[38]  Alexander J. Smola,et al.  Minimal Kernel Classifiers , 2002, J. Mach. Learn. Res..

[39]  M. Yuan,et al.  A Reproducing Kernel Hilbert Space Approach to Functional Linear Regression , 2010, 1211.2607.

[40]  T. Gasser,et al.  Locally Adaptive Bandwidth Choice for Kernel Regression Estimators , 1993 .

[41]  Pascal Vincent,et al.  Kernel Matching Pursuit , 2002, Machine Learning.

[42]  Yonina C. Eldar,et al.  Functional Nonlinear Sparse Models , 2020, IEEE Transactions on Signal Processing.

[43]  Alejandro Ribeiro,et al.  Locally Adaptive Kernel Estimation Using Sparse Functional Programming , 2018, 2018 52nd Asilomar Conference on Signals, Systems, and Computers.

[44]  Michael Rabadi,et al.  Kernel Methods for Machine Learning , 2015 .

[45]  A. Shapiro,et al.  Duality and Optimality Conditions , 2000 .

[46]  Bor-Chen Kuo,et al.  A Kernel-Based Feature Selection Method for SVM With RBF Kernel for Hyperspectral Image Classification , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[47]  Tao Yu,et al.  Adaptive spherical Gaussian kernel in sparse Bayesian learning framework for nonlinear regression , 2009, Expert Syst. Appl..

[48]  Charles A. Micchelli,et al.  When is there a representer theorem? Vector versus matrix regularizers , 2008, J. Mach. Learn. Res..

[49]  Parikshit Shah,et al.  Compressed Sensing Off the Grid , 2012, IEEE Transactions on Information Theory.

[50]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[51]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Dimitri P. Bertsekas,et al.  Convex Optimization Algorithms , 2015 .

[53]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[54]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[55]  A. Shapiro On duality theory of convex semi-infinite programming , 2005 .

[56]  Joel A. Tropp,et al.  Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[57]  Chin-Teng Lin,et al.  An automatic method for selecting the parameter of the RBF kernel function to support vector machines , 2010, 2010 IEEE International Geoscience and Remote Sensing Symposium.

[58]  Andrzej Ruszczyński,et al.  On convergence of the stochastic subgradient method with on-line stepsize rules , 1986 .

[59]  Kaare Brandt Petersen,et al.  Kernel Multivariate Analysis Framework for Supervised Subspace Learning: A Tutorial on Linear and Kernel Multivariate Methods , 2013, IEEE Signal Processing Magazine.

[60]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.