Application of Discriminant Analysis and Cross-Validation on Proteomics Data.

High-throughput proteomic experiments have raised the importance and complexity of bioinformatic analysis to extract useful information from raw data. Discriminant analysis is frequently used to identify differences among test groups of individuals or to describe combinations of discriminant variables. However, even in relatively large studies, the number of detected variables typically largely exceeds the number of samples and the classifiers should be thoroughly validated to assess their performance for new samples. Cross-validation is a widely approach when an external validation set is not available. In this chapter, different approaches for cross-validation are presented including relevant aspects that should be taken into account to avoid overly optimistic results and the assessment of the statistical significance of cross-validated figures of merit.

[1]  L. A. Stone,et al.  Computer Aided Design of Experiments , 1969 .

[2]  Age K. Smilde,et al.  Discriminant Q2 (DQ2) for improved discrimination in PLSDA models , 2008, Metabolomics.

[3]  Maria E. Holmboe,et al.  Monte-Carlo methods for determining optimal number of significant variables. Application to mouse urinary profiles , 2009, Metabolomics.

[4]  Miguel de la Guardia,et al.  Evaluation of the effect of chance correlations on variable selection using Partial Least Squares-Discriminant Analysis. , 2013, Talanta.

[5]  Marcel J. T. Reinders,et al.  Fewer permutations, more accurate P-values , 2009, Bioinform..

[6]  Age K. Smilde,et al.  Assessing the performance of statistical validation tools for megavariate metabolomics data , 2006, Metabolomics.

[7]  Paul Geladi,et al.  Principles of Proper Validation: use and abuse of re‐sampling for validation , 2010 .

[8]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[9]  Bjørn K. Alsberg,et al.  Cross model validation and optimisation of bilinear regression models , 2008 .

[10]  P. Filzmoser,et al.  Repeated double cross validation , 2009 .

[11]  Age K. Smilde,et al.  UvA-DARE ( Digital Academic Repository ) Assessment of PLSDA cross validation , 2008 .

[12]  Ljubomir J. Buturovic,et al.  Cross-validation pitfalls when selecting and assessing regression and classification models , 2014, Journal of Cheminformatics.

[13]  Age K. Smilde,et al.  Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies , 2011, Metabolomics.

[14]  Katherine A. Bakeev Process analytical technology : spectroscopic tools and implementation strategies for the chemical and pharmaceutical industries , 2010 .