Recent Advances and Trends in Large-Scale Kernel Methods

Kernel methods such as the support vector machine are one of the most successful algorithms in modern machine learning. Their advantage is that linear algorithms are extended to non-linear scenarios in a straightforward way by the use of the kernel trick. However, naive use of kernel methods is computationally expensive since the computational complexity typically scales cubically with respect to the number of training samples. In this article, we review recent advances in the kernel methods, with emphasis on scalability for massive problems.

[1]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[2]  Carl Edward Rasmussen,et al.  Observations on the Nyström Method for Gaussian Process Prediction , 2002 .

[3]  Jason Weston,et al.  Mismatch String Kernels for SVM Protein Classification , 2002, NIPS.

[4]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[5]  Naoki Abe,et al.  Proximity-Based Anomaly Detection Using Sparse Structure Learning , 2009, SDM.

[6]  Alexander J. Smola,et al.  Fast Kernels for String and Tree Matching , 2002, NIPS.

[7]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[8]  Alexander J. Smola,et al.  A scalable modular convex solver for regularized risk minimization , 2007, KDD '07.

[9]  Masashi Sugiyama,et al.  Robust Label Propagation on Multiple Networks , 2009, IEEE Transactions on Neural Networks.

[11]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[12]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[13]  Gang Wang,et al.  The Kernel Path in Kernelized LASSO , 2007, AISTATS.

[14]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[15]  Golub Gene H. Et.Al Matrix Computations, 3rd Edition , 2007 .

[16]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[17]  Akiko Takeda,et al.  ν-support vector machine as conditional value-at-risk minimization , 2008, ICML '08.

[18]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[19]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[20]  Masashi Sugiyama,et al.  Lanczos Approximations for the Speedup of Kernel Partial Least Squares Regression , 2009, AISTATS.

[21]  Sridhar Mahadevan Fast Spectral Learning using Lanczos Eigenspace Projections , 2008, AAAI.

[22]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[23]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[24]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[25]  Gavin C. Cawley,et al.  Fast exact leave-one-out cross-validation of sparse least-squares support vector machines , 2004, Neural Networks.

[26]  Le Song,et al.  Supervised feature selection via dependence estimation , 2007, ICML '07.

[27]  G. Wahba Spline models for observational data , 1990 .

[28]  Nicole Krämer,et al.  Kernelizing PLS, degrees of freedom, and efficient model selection , 2007, ICML '07.

[29]  Cheng Soon Ong,et al.  Multiclass multiple kernel learning , 2007, ICML '07.

[30]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[32]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[33]  Tsuyoshi Kato,et al.  Selective integration of multiple biological data for supervised network inference , 2005, Bioinform..

[34]  Paul Horton,et al.  Network-based de-noising improves prediction from microarray data , 2006, BMC Bioinformatics.

[35]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[36]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[37]  Michinari Momma,et al.  Efficient computations via scalable sparse kernel partial least squares and boosted latent features , 2005, KDD '05.

[38]  Yoshihiro Yamanishi,et al.  On Pairwise Kernels: An Efficient Alternative and Generalization Analysis , 2009, PAKDD.

[39]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[40]  Zhi-Hua Zhou,et al.  On the Margin Explanation of Boosting Algorithms , 2008, COLT.

[41]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[42]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[43]  Larry S. Davis,et al.  Efficient Kernel Machines Using the Improved Fast Gauss Transform , 2004, NIPS.

[44]  Hisashi Kashima,et al.  Kernels for Semi-Structured Data , 2002, ICML.

[45]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[46]  Yoshihiro Yamanishi,et al.  propagation: A fast semisupervised learning algorithm for link prediction , 2009 .

[47]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[48]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[49]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[50]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[51]  H. Kashima,et al.  Link propagation : A fast semi-supervised algorithm for link prediction , 2009, SDM.

[52]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[53]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[54]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[55]  Christopher D. Manning,et al.  Using Feature Conjunctions Across Examples for Learning Pairwise Classifiers , 2004, ECML.

[56]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[57]  Nicole Krämer,et al.  Partial least squares regression for graph mining , 2008, KDD.

[58]  Jason Weston,et al.  Learning Gene Functional Classifications from Multiple Data Types , 2002, J. Comput. Biol..

[59]  S. V. N. Vishwanathan,et al.  Fast Computation of Graph Kernels , 2006, NIPS.

[60]  Volker Roth,et al.  The generalized LASSO , 2004, IEEE Transactions on Neural Networks.

[61]  Thomas Hofmann,et al.  Unifying collaborative and content-based filtering , 2004, ICML.

[62]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[63]  Bernhard Schölkopf,et al.  A kernel view of the dimensionality reduction of manifolds , 2004, ICML.

[64]  Nando de Freitas,et al.  Fast Krylov Methods for N-Body Learning , 2005, NIPS.

[65]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[66]  Audra E. Kosh,et al.  Linear Algebra and its Applications , 1992 .

[67]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[68]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[69]  Sören Sonnenburg,et al.  Optimized cutting plane algorithm for support vector machines , 2008, ICML '08.

[70]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[71]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[72]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[73]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[74]  William H. Press,et al.  Numerical Recipes in C, 2nd Edition , 1992 .

[75]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[76]  Jason Weston,et al.  Large-scale kernel machines , 2007 .

[77]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[78]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[79]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[80]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[81]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[82]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[83]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .

[84]  Tsuyoshi Idé,et al.  Change-Point Detection using Krylov Subspace Learning , 2007, SDM.

[85]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[86]  Volker Roth,et al.  Sparse Kernel Regressors , 2001, ICANN.

[87]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[88]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[89]  William H. Press,et al.  Numerical recipes in C , 2002 .

[90]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[91]  Andrew W. Moore,et al.  Dual-Tree Fast Gauss Transforms , 2005, NIPS.

[92]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[93]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[94]  Michael E. Tipping,et al.  Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003 .

[95]  Kilian Q. Weinberger,et al.  Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[96]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.

[97]  Hiroshi Yasuda,et al.  A gram distribution kernel applied to glycan classification and motif extraction. , 2006, Genome informatics. International Conference on Genome Informatics.

[98]  M. Best An Algorithm for the Solution of the Parametric Quadratic Programming Problem , 1996 .

[99]  R. Andrew,et al.  Potential sources of intrinsic optical signals imaged in live brain slices. , 1999, Methods.

[100]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[101]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .