Sparse Kernel Canonical Correlation Analysis via $\ell_1$-regularization

Abstract—Canonical correlation analysis (CCA) is a multi-variate statistical technique for finding the linear relationshipbetween two sets of variables. The kernel generalization ofCCA named kernel CCA has been proposed to find nonlinearrelations between data sets. Despite the wide usage of CCAand kernel CCA, they have one common limitation that is thelack of sparsity in their solution. In this paper, we considersparse kernel CCA and propose a novel sparse kernel CCAalgorithm (SKCCA). Our algorithm is based on a relationshipbetween kernel CCA and least squares. Sparsity of the dualtransformations is introduced by penalizing the ‘ 1 -norm of dualvectors. Experiments demonstrate that our algorithm not onlyperforms well in computing sparse dual transformations butalso can alleviate the over-fitting problem of kernel CCA.Index Terms—canonical correlation analysis, kernel, sparsity I. I NTRODUCTION T HE description of relationship between two sets ofvariables has long been an interesting topic to manyresearchers. Canonical correlation analysis (CCA) [10] isa multivariate statistical technique for finding the linearrelationship between two sets of variables. It seeks a lineartransformation for each of the two sets of variables in away that the projected variables in the transformed spaceare maximally correlated. In recent years, CCA has beensuccessfully applied in various areas, including genomicdata analysis [19], [20] and bilingual analysis [18], whereresearchers can measure multiple sets of variables on a singlesubject. For instance, DNA copy number variations, geneexpression, and single nucleotide polymorphism (SNP) datamight all be available on a common set of patient samples.Since CCA only consider linear transformation of theoriginal variables, it fails to capture nonlinear relations. How-ever, in a wide range of practical problems linear relationsmay not be adequate for studying relation among variables.Detecting nonlinear relations among data is important anduseful in modern data analysis, especially when dealingwith data that are not in the form of vectors, such as textdocuments, images, microarray data and so on. A naturalextension, therefore, is to explore and exploit nonlinear rela-tions among data. Among nonlinear extensions of CCA, onemost frequently used approach is the kernel generalization

[1]  Delin Chu,et al.  Sparse Canonical Correlation Analysis: New Formulation and Algorithm. , 2013, IEEE transactions on pattern analysis and machine intelligence.

[2]  Sivaraman Balakrishnan,et al.  Sparse Additive Functional and Kernel CCA , 2012, ICML.

[3]  Jieping Ye,et al.  Canonical Correlation Analysis for Multilabel Classification: A Least-Squares Formulation, Extensions, and Analysis , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  John Shawe-Taylor,et al.  Sparse canonical correlation analysis , 2009, Machine Learning.

[5]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[6]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[7]  Yin Zhang,et al.  Fixed-Point Continuation for l1-Minimization: Methodology and Convergence , 2008, SIAM J. Optim..

[8]  Jieping Ye,et al.  A least squares formulation for canonical correlation analysis , 2008, ICML '08.

[9]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[10]  Shotaro Akaho,et al.  A kernel method for canonical correlation analysis , 2006, ArXiv.

[11]  Stephen P. Boyd,et al.  Optimal kernel selection in Kernel Fisher discriminant analysis , 2006, ICML.

[12]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[13]  Yoshihiro Yamanishi,et al.  Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correlation analysis , 2003, ISMB.

[14]  Nello Cristianini,et al.  Inferring a Semantic Representation of Text via Cross-Language Correlation Analysis , 2002, NIPS.

[15]  Colin Fyfe,et al.  Sparse Kernel Canonical Correlation Analysis , 2001, ESANN.

[16]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[19]  H. Hotelling Relations Between Two Sets of Variates , 1936 .