论文信息 - A Classification Model for the Leiden Proteomics Competition

A Classification Model for the Leiden Proteomics Competition

A strategy is presented to build a discrimination model in proteomics studies. The model is built using cross-validation. This cross-validation step can simply be combined with a variable selection method, called rank products. The strategy is especially suitable for the low-samples-to-variables-ratio (undersampling) case, as is often encountered in proteomics and metabolomics studies. As a classification method, Principal Component Discriminant Analysis is used; however, the methodology can be used with any classifier. A data set containing serum samples from breast cancer patients and healthy controls is analysed. Double cross-validation shows that the sensitivity of the model is 82% and the specificity 86%. Potential putative biomarkers are identified using the variable selection method. In each cross-validation loop a classification model is built. The final classification uses a majority voting scheme from the ensemble classifier.

[1] M. Stone. Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[2] Eric R. Ziegel,et al. Handbook of Chemometrics and Qualimetrics, Part B , 2000, Technometrics.

[3] Eric R. Ziegel,et al. The Elements of Statistical Learning , 2003, Technometrics.

[4] Rainer Breitling,et al. Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments , 2004, FEBS letters.

[5] P. G. Kistemaker,et al. Discriminant analysis by double stage principal component analysis , 1983 .

[6] Haesun Park,et al. Generalizing discriminant analysis using the generalized singular value decomposition , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] D. Ruppert. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[8] A. Smilde,et al. Assessing the statistical validity of proteomics based biomarkers. , 2007, Analytica chimica acta.

[9] Bruce Randall Donald,et al. Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum , 2003, J. Comput. Biol..

[10] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[11] Bart J. A. Mertens,et al. Mass Spectrometry Proteomic Diagnosis: Enacting the Double Cross-Validatory Paradigm , 2006, J. Comput. Biol..