Canonical correlation analysis for multivariate regression and its application to metabolic fingerprinting

Abstract Multivariate regression analysis is one of the most important tools in metabolomics studies. For regression of high-dimensional data, partial least squares (PLS) has been widely used. Canonical correlation analysis (CCA) is a classic method of multivariate analysis; it has however rarely been applied to multivariate regression. In the present study, we applied PLS and regularized CCA (RCCA) to high-dimensional data where the number of variables (p) exceeds the number of observations (N), N ≪ p. Using kernel CCA with linear kernel can drastically reduce the calculation time of RCCA. We applied these methods to gas chromatography–mass spectrometry (GC–MS) data, which were analyzed to resolve the problem of Japanese green tea ranking. To construct a quality-predictive model, the optimal number of latent variables in RCCA determined by leave-one-out cross-validation (LOOCV) was significantly fewer than in PLS. For metabolic fingerprinting, we successfully identified important metabolites for green tea grade classification using PLS and RCCA.

[1]  M. Sjöström,et al.  Design of experiments: an efficient strategy to identify factors influencing extraction and derivatization of Arabidopsis thaliana samples in metabolomic studies with gas chromatography/mass spectrometry. , 2004, Analytical biochemistry.

[2]  Marko Grobelnik,et al.  Subspace, Latent Structure and Feature Selection techniques , 2006 .

[3]  Shotaro Akaho,et al.  A kernel method for canonical correlation analysis , 2006, ArXiv.

[4]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[5]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[6]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[7]  Gregory Stephanopoulos,et al.  Evaluation of regression models in metabolic physiology: predicting fluxes from isotopic data without knowledge of the pathway , 2006, Metabolomics.

[8]  Oliver Fiehn,et al.  Combining Genomics, Metabolome Analysis, and Biochemical Modelling to Understand Metabolic Networks , 2001, Comparative and functional genomics.

[9]  Robert P. Cogdill,et al.  Least-Squares Support Vector Machines for Chemometrics: An Introduction and Evaluation , 2004 .

[10]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[11]  Yukihiro Ozaki,et al.  Investigations of bagged kernel partial least squares (KPLS) and boosting KPLS with applications to near‐infrared (NIR) spectra , 2006 .

[12]  S. Wold,et al.  The kernel algorithm for PLS , 1993 .

[13]  Eiichiro Fukusaki,et al.  Prediction of Japanese green tea ranking by gas chromatography/mass spectrometry-based hydrophilic metabolite fingerprinting. , 2007, Journal of agricultural and food chemistry.

[14]  A. Boulesteix PLS Dimension Reduction for Classification with Microarray Data , 2004, Statistical applications in genetics and molecular biology.

[15]  E. Fukusaki,et al.  Plant metabolomics: potential for practical operation. , 2005, Journal of bioscience and bioengineering.

[16]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[17]  H. Knutsson,et al.  A Unified Approach to PCA, PLS, MLR and CCA , 1997 .

[18]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[19]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[20]  Roman Rosipal,et al.  Overview and Recent Advances in Partial Least Squares , 2005, SLSFS.

[21]  D. Kell,et al.  A functional genomics strategy that uses metabolome data to reveal the phenotype of silent mutations , 2001, Nature Biotechnology.

[22]  Anne-Laure Boulesteix,et al.  Partial least squares: a versatile tool for the analysis of high-dimensional genomic data , 2006, Briefings Bioinform..

[23]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[24]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[25]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .