K-OPLS package: Kernel-based orthogonal projections to latent structures for prediction and interpretation in feature space

BackgroundKernel-based classification and regression methods have been successfully applied to modelling a wide variety of biological data. The Kernel-based Orthogonal Projections to Latent Structures (K-OPLS) method offers unique properties facilitating separate modelling of predictive variation and structured noise in the feature space. While providing prediction results similar to other kernel-based methods, K-OPLS features enhanced interpretational capabilities; allowing detection of unanticipated systematic variation in the data such as instrumental drift, batch variability or unexpected biological variation.ResultsWe demonstrate an implementation of the K-OPLS algorithm for MATLAB and R, licensed under the GNU GPL and available at http://www.sourceforge.net/projects/kopls/. The package includes essential functionality and documentation for model evaluation (using cross-validation), training and prediction of future samples. Incorporated is also a set of diagnostic tools and plot functions to simplify the visualisation of data, e.g. for detecting trends or for identification of outlying samples. The utility of the software package is demonstrated by means of a metabolic profiling data set from a biological study of hybrid aspen.ConclusionThe properties of the K-OPLS method are well suited for analysis of biological data, which in conjunction with the availability of the outlined open-source package provides a comprehensive solution for kernel-based analysis in bioinformatics applications.

[1]  D. Gauguier,et al.  Statistical total correlation spectroscopy: an exploratory approach for latent biomarker identification from metabolic 1H NMR data sets. , 2005, Analytical chemistry.

[2]  S. Wold,et al.  Orthogonal projections to latent structures (O‐PLS) , 2002 .

[3]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[4]  William Stafford Noble,et al.  A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra and SEQUEST scores. , 2003, Journal of proteome research.

[5]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[6]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[7]  Johan A. K. Suykens,et al.  Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction , 2004, Bioinform..

[8]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[9]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[10]  J. Trygg,et al.  Evaluation of the orthogonal projection on latent structure model limitations caused by chemical shift variability and improved visualization of biomarker changes in 1H NMR spectroscopic metabonomic studies. , 2005, Analytical chemistry.

[11]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[12]  Daniel Eriksson,et al.  Orthogonal projections to latent structures as a strategy for microarray data normalization , 2007, BMC Bioinformatics.

[13]  Andrzej Cichocki,et al.  Kernel PCA for Feature Extraction and De-Noising in Nonlinear Regression , 2001, Neural Computing & Applications.

[14]  M. Rantalainen,et al.  Kernel‐based orthogonal projections to latent structures (K‐OPLS) , 2007 .

[15]  B. Walczak,et al.  About kernel latent variable approaches and SVM , 2005 .

[16]  Henrik Antti,et al.  A new metabonomic strategy for analysing the growth process of the poplar tree. , 2005, Plant biotechnology journal.

[17]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[18]  O. Kvalheim Latent Variable , 1992, The SAGE Encyclopedia of Research Design.

[19]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[20]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[21]  M. Rantalainen,et al.  OPLS discriminant analysis: combining the strengths of PLS‐DA and SIMCA classification , 2006 .

[22]  Johan Trygg,et al.  O2‐PLS, a two‐block (X–Y) latent variable regression (LVR) method with an integral OSC filter , 2003 .

[23]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[24]  S. Wold,et al.  The kernel algorithm for PLS , 1993 .