libPLS: An integrated library for partial least squares regression and linear discriminant analysis

Abstract Partial least squares (PLS) have gained wide applications especially in chemometrics, metabolomics/metabonomics as well as bioinformatics. Here, we present libPLS, a library that integrates not only basic PLS modeling algorithms but also advanced and/or recently developed methods on model assessment, outlier detection, and variable selection. This package is featured in a set of Model Population Analysis (MPA)-type approaches that have not been integrated into a single package yet and thus functionally complement existing toolboxes. libPLS provides an integrated platform for developing PLS regression and/or linear discriminant analysis (PLS-LDA) models. It is written in MATLAB and freely available at www.libpls.net .

[1]  W. Cai,et al.  A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra , 2008 .

[2]  Xueguang Shao,et al.  A wavelength selection method based on randomization test for near-infrared spectral analysis , 2009 .

[3]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[4]  Dong-Sheng Cao,et al.  Model-population analysis and its applications in chemical and biological modeling , 2012 .

[5]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[6]  M. Kearns,et al.  Algorithmic stability and sanity-check bounds for leave-one-out cross-validation , 1999 .

[7]  R. Yu,et al.  An ensemble of Monte Carlo uninformative variable elimination for wavelength selection. , 2008, Analytica chimica acta.

[8]  M. Hubert,et al.  Robust methods for partial least squares regression , 2003 .

[9]  Rasmus Bro,et al.  Some common misunderstandings in chemometrics , 2010 .

[10]  David I. Ellis,et al.  A tutorial review: Metabolomics and partial least squares-discriminant analysis--a marriage of convenience or a shotgun wedding. , 2015, Analytica chimica acta.

[11]  Dong-Sheng Cao,et al.  Recipe for uncovering predictive genes using support vector machines based on model population analysis , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  L. A. Stone,et al.  Computer Aided Design of Experiments , 1969 .

[13]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[14]  Beata Walczak Outlier detection in multivariate calibration , 1995 .

[15]  Johan Trygg,et al.  Chemometrics in metabonomics. , 2007, Journal of proteome research.

[16]  Marina Vannucci,et al.  Gene selection: a Bayesian variable selection approach , 2003, Bioinform..

[17]  Hongdong Li,et al.  Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. , 2009, Analytica chimica acta.

[18]  Qing-Song Xu,et al.  Uncover the path from PCR to PLS via elastic component regression , 2010 .

[19]  Martin Andersson,et al.  A comparison of nine PLS1 algorithms , 2009 .

[20]  P. Filzmoser,et al.  Repeated double cross validation , 2009 .

[21]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[22]  Yoshua Bengio,et al.  No Unbiased Estimator of the Variance of K-Fold Cross-Validation , 2003, J. Mach. Learn. Res..

[23]  Johan Trygg,et al.  Chemometrics in metabolomics--a review in human disease diagnosis. , 2010, Analytica chimica acta.

[24]  Kim-Anh Lê Cao,et al.  mixOmics: An R package for ‘omics feature selection and multiple data integration , 2017, bioRxiv.

[25]  Age K. Smilde,et al.  UvA-DARE ( Digital Academic Repository ) Assessment of PLSDA cross validation , 2008 .

[26]  Randy J. Pell,et al.  Multiple outlier detection for multivariate calibration using robust statistical techniques , 2000 .

[27]  Tarja Rajalahti,et al.  Discriminating variable test and selectivity ratio plot: quantitative tools for interpretation and variable (biomarker) selection in complex spectral or chromatographic profiles. , 2009, Analytical chemistry.

[28]  G. Geffen,et al.  Double Cross-Validation and Improved Sensitivity of the Rapid Screen of Mild Traumatic Brain Injury , 2004, Journal of clinical and experimental neuropsychology.

[29]  D. Massart,et al.  Elimination of uninformative variables for multivariate calibration. , 1996, Analytical chemistry.

[30]  Qing-Song Xu,et al.  Random frog: an efficient reversible jump Markov Chain Monte Carlo-like approach for variable selection with applications to gene selection and disease classification. , 2012, Analytica chimica acta.

[31]  Dong-Sheng Cao,et al.  Model population analysis for variable selection , 2010 .

[32]  Yi-Zeng Liang,et al.  Monte Carlo cross validation , 2001 .

[33]  S. Tsakovski,et al.  Comparison of the variable importance in projection (VIP) and of the selectivity ratio (SR) methods for variable selection and interpretation , 2015 .

[34]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[35]  O. Kvalheim,et al.  Pretreatment of mass spectral profiles: application to proteomic data. , 2007, Analytical chemistry.

[36]  Christophe Croux,et al.  TOMCAT: A MATLAB toolbox for multivariate calibration techniques , 2007 .

[37]  Qing-Song Xu,et al.  A phase diagram for gene selection and disease classification , 2014, bioRxiv.

[38]  M. C. U. Araújo,et al.  The successive projections algorithm for variable selection in spectroscopic multicomponent analysis , 2001 .

[39]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[40]  Qing-Song Xu,et al.  Variable complementary network: a novel approach for identifying biomarkers and their mutual associations , 2012, Metabolomics.

[41]  H. Wold Path Models with Latent Variables: The NIPALS Approach , 1975 .

[42]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[43]  O. Kvalheim,et al.  Biomarker discovery in mass spectral profiles by means of selectivity ratio plot , 2009 .

[44]  Mia Hubert,et al.  LIBRA: a MATLAB library for robust analysis , 2005 .

[45]  Dong-Sheng Cao,et al.  A new strategy of outlier detection for QSAR/QSPR , 2009, J. Comput. Chem..

[46]  Dong-Sheng Cao,et al.  Recipe for revealing informative metabolites based on model population analysis , 2010, Metabolomics.

[47]  Yi-Zeng Liang,et al.  Plasma fatty acid metabolic profiling and biomarkers of type 2 diabetes mellitus based on GC/MS and PLS‐LDA , 2006, FEBS letters.

[48]  Ron Wehrens,et al.  The pls Package: Principal Component and Partial Least Squares Regression in R , 2007 .

[49]  S. Morgan,et al.  Outlier detection in multivariate analytical chemical data. , 1998, Analytical chemistry.

[50]  S. Wold,et al.  Wavelength interval selection in multicomponent spectral analysis by moving window partial least-squares regression with applications to mid-infrared and near-infrared spectroscopic data. , 2002, Analytical chemistry.

[51]  Yi-Zeng Liang,et al.  Monte Carlo cross‐validation for selecting a model and estimating the prediction error in multivariate calibration , 2004 .

[52]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[53]  Qianxu Yang,et al.  MultiDA: Chemometric software for multivariate data analysis based on Matlab , 2012 .