ORCCA: Optimal Randomized Canonical Correlation Analysis.

Random features approach has been widely used for kernel approximation in large-scale machine learning. A number of recent studies have explored data-dependent sampling of features, modifying the stochastic oracle from which random features are sampled. While proposed techniques in this realm improve the approximation, their suitability is often verified on a single learning task. In this article, we propose a task-specific scoring rule for selecting random features, which can be employed for different applications with some adjustments. We restrict our attention to canonical correlation analysis (CCA) and provide a novel, principled guide for finding the score function maximizing the canonical correlations. We prove that this method, called optimal randomized CCA (ORCCA), can outperform (in expectation) the corresponding kernel CCA with a default kernel. Numerical experiments verify that ORCCA is significantly superior to other approximation techniques in the CCA task.

[1]  Changshui Zhang,et al.  Dependent Online Kernel Learning With Constant Number of Random Fourier Features , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[2]  H. O. Lancaster The Structure of Bivariate Distributions , 1958 .

[3]  Ambuj Tewari,et al.  But How Does It Work in Theory? Linear SVM with Random Features , 2018, NeurIPS.

[4]  Honglak Lee,et al.  Deep Variational Canonical Correlation Analysis , 2016, ArXiv.

[5]  Shahin Shahrampour,et al.  Distributed Parameter Estimation in Randomized One-hidden-layer Neural Networks , 2020, 2020 American Control Conference (ACC).

[6]  Vikas Sindhwani,et al.  Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels , 2014, J. Mach. Learn. Res..

[7]  Yiming Yang,et al.  Data-driven Random Fourier Features using Stein Effect , 2017, IJCAI.

[8]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[9]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[10]  Bernhard Schölkopf,et al.  The Randomized Dependence Coefficient , 2013, NIPS.

[11]  AI Koan,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[12]  Karen Livescu,et al.  Nonparametric Canonical Correlation Analysis , 2015, ICML.

[13]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[14]  Bernhard Schölkopf,et al.  Randomized Nonlinear Component Analysis , 2014, ICML.

[15]  Francis R. Bach,et al.  On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..

[16]  Colin Fyfe,et al.  Kernel and Nonlinear Canonical Correlation Analysis , 2000, IJCNN.

[17]  Huda Khayrallah,et al.  Deep Generalized Canonical Correlation Analysis , 2017, RepL4NLP@ACL.

[18]  Lorenzo Rosasco,et al.  Generalization Properties of Learning with Random Features , 2016, NIPS.

[19]  Alexander J. Smola,et al.  Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.

[20]  Harish Karnick,et al.  Random Feature Maps for Dot Product Kernels , 2012, AISTATS.

[21]  Trung Le,et al.  Large-scale Online Kernel Learning with Random Feature Reparameterization , 2017, IJCAI.

[22]  Barnabás Póczos,et al.  Bayesian Nonparametric Kernel-Learning , 2015, AISTATS.

[23]  Johan A. K. Suykens,et al.  Regularized Semipaired Kernel CCA for Domain Adaptation , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[24]  Shou-De Lin,et al.  Sparse Random Feature Algorithm as Coordinate Descent in Hilbert Space , 2014, NIPS.

[25]  Zhu Li,et al.  Towards a Unified Analysis of Random Fourier Features , 2018, ICML.

[26]  Sanjiv Kumar,et al.  Learning Adaptive Random Features , 2019, AAAI.

[27]  Trevor Campbell,et al.  Data-dependent compression of random features for large-scale kernel approximation , 2019, AISTATS.

[28]  Po-Sen Huang,et al.  Random features for Kernel Deep Convex Network , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Brian Kingsbury,et al.  Kernel Approximation Methods for Speech Recognition , 2017, J. Mach. Learn. Res..

[30]  Yousef Saad,et al.  Trace optimization and eigenproblems in dimension reduction methods , 2011, Numer. Linear Algebra Appl..

[31]  Le Song,et al.  A la Carte - Learning Fast Kernels , 2014, AISTATS.

[32]  Yi Zhang,et al.  Not-So-Random Features , 2017, ICLR.

[33]  Shih-Fu Chang,et al.  Compact Nonlinear Maps and Circulant Extensions , 2015, ArXiv.

[34]  Quanfu Fan,et al.  Random Laplace Feature Maps for Semigroup Kernels on Histograms , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Richard E. Turner,et al.  The Geometry of Random Features , 2018, AISTATS.

[36]  John C. Duchi,et al.  Learning Kernels with Random Features , 2016, NIPS.

[37]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[38]  Juho Rousu,et al.  Large-Scale Sparse Kernel Canonical Correlation Analysis , 2019, ICML.

[39]  Sanjiv Kumar,et al.  Orthogonal Random Features , 2016, NIPS.

[40]  Karen Livescu,et al.  Large-Scale Approximate Kernel Canonical Correlation Analysis , 2015, ICLR.

[41]  Shahin Shahrampour,et al.  On Sampling Random Features From Empirical Leverage Scores: Implementation and Theoretical Guarantees , 2019, ArXiv.

[42]  Li Wang,et al.  Deep Tensor CCA for Multi-View Learning , 2020, IEEE Transactions on Big Data.

[43]  Nathan Srebro,et al.  Stochastic Approximation for Canonical Correlation Analysis , 2017, NIPS.

[44]  Vahid Tarokh,et al.  On Data-Dependent Random Features for Improved Generalization in Supervised Learning , 2017, AAAI.

[45]  Ameya Velingker,et al.  Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees , 2018, ICML.

[46]  Nathan Srebro,et al.  Efficient Globally Convergent Stochastic Optimization for Canonical Correlation Analysis , 2016, NIPS.

[47]  Lorenzo Rosasco,et al.  Decentralised Learning with Random Features and Distributed Gradient Descent , 2020, ICML.

[48]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[49]  Tijl De Bie,et al.  Eigenproblems in Pattern Recognition , 2005 .