Transformation Learning Via Kernel Alignment

This article proposes an algorithm to automatically learn useful transformations of data to improve accuracy in supervised classification tasks. These transformations take the form of a mixture of base transformations and are learned by maximizing the kernel alignment criterion. Because the proposed optimization is nonconvex, a semidefinite relaxation is derived to find an approximate global solution. This new convex algorithm learns kernels made up of a matrix mixture of transformations. This formulation yields a simpler optimization while achieving comparable or improved accuracies to previous transformation learning algorithms based on maximizing the margin. Remarkably, the new optimization problem does not slow down with the availability of additional data allowing it to scale to large datasets. One application of this method is learning monotonic transformations constructed from a base set of truncated ramp functions. These monotonic transformations permit a nonlinear filtering of the input to the classifier. The effectiveness of the method is demonstrated on synthetic data, text data and image data.

[1]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[2]  Olivier Bousquet,et al.  On the Complexity of Learning the Kernel Matrix , 2002, NIPS.

[3]  Tony Jebara,et al.  Large margin transformation learning , 2009 .

[4]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[5]  Alexander J. Smola,et al.  Learning the Kernel with Hyperkernels , 2005, J. Mach. Learn. Res..

[6]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[7]  William Stafford Noble,et al.  Nonstationary kernel combination , 2006, ICML.

[8]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[9]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[10]  Koby Crammer,et al.  Kernel Design Using Boosting , 2002, NIPS.

[11]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[12]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[13]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[14]  Tony Jebara,et al.  Probability Product Kernels , 2004, J. Mach. Learn. Res..

[15]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Tony Jebara,et al.  Learning Monotonic Transformations for Classification , 2007, NIPS.

[17]  J. Gallier Quadratic Optimization Problems , 2020, Linear Algebra and Optimization with Applications to Machine Learning.

[18]  Kiyoshi Asai,et al.  The em Algorithm for Kernel Matrix Completion with Auxiliary Data , 2003, J. Mach. Learn. Res..

[19]  Matthias Hein,et al.  Hilbertian Metrics and Positive Definite Kernels on Probability Measures , 2005, AISTATS.

[20]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..