Dimensionality reduction for metabolome data using PCA, PLS, OPLS, and RFDA with differential penalties to latent variables

Dimensionality reduction is an important technique for preprocessing of high-dimensional data. Because only one side of the original data is represented in a low-dimensional subspace, useful information may be lost. In the present study, novel dimensionality reduction methods were developed that are suitable for metabolome data, where observation varies with time. Metabolomics deal with this type of data, which are often obtained in microorganism fermentation processes. However, no dimensionality reduction method that utilizes information from the original data in a positive manner has been reported to date. The ordinary dimensionality reduction methods of principal component analysis (PCA), partial least squares (PLS), orthonormalized PLS (OPLS), and regularized Fisher discriminant analysis (RFDA) were extended by introducing differential penalties to the latent variables in each class. A nonlinear extension of this approach, using kernel methods, was also proposed in the form of kernel-smoothed PCA, PLS, OPLS, and FDA. Since all of these methods are formulated as generalized eigenvalue problems, the solutions can be computed easily. These methods were then applied to intracellular metabolite data of a xylose-fermenting yeast in ethanol fermentation. Visualization in the low-dimensional subspace suggests that smoothed PCA successfully preserves the information about the time course of observations during fermentation, and that RFDA can produce high separation among different strains.

[1]  Marko Grobelnik,et al.  Subspace, Latent Structure and Feature Selection techniques , 2006 .

[2]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[3]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[4]  Yang Li,et al.  Determination of the number of components in mixtures using a new approach incorporating chemical information , 1999 .

[5]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[6]  Shotaro Akaho,et al.  A kernel method for canonical correlation analysis , 2006, ArXiv.

[7]  Uwe Sauer,et al.  Evolutionary Engineering of Saccharomyces cerevisiae for Anaerobic Growth on Xylose , 2003, Applied and Environmental Microbiology.

[8]  B. Silverman,et al.  Smoothed functional principal components analysis by choice of norm , 1996 .

[9]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[10]  Hans-Georg Müller,et al.  Functional Data Analysis , 2016 .

[11]  T. Næs,et al.  From dummy regression to prior probabilities in PLS‐DA , 2007 .

[12]  P. Eilers A perfect smoother. , 2003, Analytical chemistry.

[13]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[14]  Gerhard Tutz,et al.  Penalized Partial Least Squares Based on B-Splines Transformations , 2006 .

[15]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[16]  Hiromu Ohno,et al.  Canonical correlation analysis for multivariate regression and its application to metabolic fingerprinting , 2008 .

[17]  Akihiko Kondo,et al.  Ethanol fermentation from lignocellulosic hydrolysate by a recombinant xylose- and cellooligosaccharide-assimilating yeast strain , 2006, Applied Microbiology and Biotechnology.

[18]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[19]  A. Boulesteix,et al.  Penalized Partial Least Squares with Applications to B-Spline Transformations and Functional Data , 2006, math/0608576.

[20]  Roman Rosipal,et al.  Overview and Recent Advances in Partial Least Squares , 2005, SLSFS.