Learning Random Fourier Features by Hybrid Constrained Optimization

The kernel embedding algorithm is an important component for adapting kernel methods to large datasets. Since the algorithm consumes a major computation cost in the testing phase, we propose a novel teacher-learner framework of learning computation-efficient kernel embeddings from specific data. In the framework, the high-precision embeddings (teacher) transfer the data information to the computation-efficient kernel embeddings (learner). We jointly select informative embedding functions and pursue an orthogonal transformation between two embeddings. We propose a novel approach of constrained variational expectation maximization (CVEM), where the alternate direction method of multiplier (ADMM) is applied over a nonconvex domain in the maximization step. We also propose two specific formulations based on the prevalent Random Fourier Feature (RFF), the masked and blocked version of Computation-Efficient RFF (CERF), by imposing a random binary mask or a block structure on the transformation matrix. By empirical studies of several applications on different real-world datasets, we demonstrate that the CERF significantly improves the performance of kernel methods upon the RFF, under certain arithmetic operation requirements, and suitable for structured matrix multiplication in Fastfood type algorithms.

[1]  Alexander J. Smola,et al.  Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Atri Rudra,et al.  A Two Pronged Progress in Structured Dense Matrix Multiplication , 2016, 1611.01569.

[4]  Xiaowei Zhou,et al.  3D Shape Reconstruction from 2D Landmarks: A Convex Formulation , 2014, ArXiv.

[5]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[6]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[7]  Atri Rudra,et al.  Recurrence Width for Structured Dense Matrix Vector Multiplication , 2016, ArXiv.

[8]  Prasoon Goyal,et al.  Local Deep Kernel Learning for Efficient Non-linear SVM Prediction , 2013, ICML.

[9]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[10]  Jun Liu,et al.  Efficient Euclidean projections in linear time , 2009, ICML '09.

[11]  Quanfu Fan,et al.  Random Laplace Feature Maps for Semigroup Kernels on Histograms , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[13]  Tara N. Sainath,et al.  Structured Transforms for Small-Footprint Deep Learning , 2015, NIPS.

[14]  B. V. K. Vijaya Kumar,et al.  Random Features for Sparse Signal Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[16]  Dennis DeCoste,et al.  Compact Random Feature Maps , 2013, ICML.

[17]  Le Song,et al.  A la Carte - Learning Fast Kernels , 2014, AISTATS.

[18]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[19]  Qinghua Hu,et al.  Random Feature Mapping with Signed Circulant Matrix Projection , 2015, IJCAI.

[20]  Shih-Fu Chang,et al.  Compact Nonlinear Maps and Circulant Extensions , 2015, ArXiv.

[21]  Lawrence Carin,et al.  Nonparametric factor analysis with beta process priors , 2009, ICML '09.

[22]  Krzysztof Choromanski,et al.  Recycling Randomness with Structure for Sublinear time Kernel Expansions , 2016, ICML.

[23]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[24]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[25]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[26]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[27]  Bernhard Schölkopf,et al.  Randomized Nonlinear Component Analysis , 2014, ICML.