Mass Spectrometry Proteomic Diagnosis: Enacting the Double Cross-Validatory Paradigm

This paper presents an approach to the evaluation and validation of the diagnostic potential of mass spectrometry data in an application on the construction of an "early warning" diagnostic procedure. Our approach is based on a full implementation and application of double cross-validatory calibration and evaluation. It is a key feature of this methodology that we can jointly optimize the classifiers for prediction while simultaneously calculating validated error rates. The methodology leaves the size of the training data nearly intact. We present application to data from a designed experiment in a colon-cancer study. Subsequent to presentation of results from the double cross-validatory analysis, we explore a post-hoc analysis of the calibrated classifiers to identify the markers that drive the classification.

[1]  Michael H. Kutner Applied Linear Statistical Models , 1974 .

[2]  B. Silverman,et al.  Functional Data Analysis , 1997 .

[3]  Jeffrey S. Morris,et al.  Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum , 2005, Bioinform..

[4]  Sidney Addelman,et al.  trans-Dimethanolbis(1,1,1-trifluoro-5,5-dimethylhexane-2,4-dionato)zinc(II) , 2008, Acta crystallographica. Section E, Structure reports online.

[5]  Somnath Datta,et al.  Standardization and denoising algorithms for mass spectra to classify whole-organism bacterial specimens , 2004, Bioinform..

[6]  Jeffrey S. Morris,et al.  A comprehensive approach to the analysis of matrix‐assisted laser desorption/ionization‐time of flight proteomics spectra from serum samples , 2003, Proteomics.

[7]  D. Ransohoff Rules of evidence for cancer molecular-marker discovery and validation , 2004, Nature Reviews Cancer.

[8]  I. Jolliffe Principal Component Analysis , 2002 .

[9]  J. Potter,et al.  A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. , 2003, Biostatistics.

[10]  Terence P. Speed,et al.  NORMALIZATION , BASELINE CORRECTION AND ALIGNMENT OF HIGH-THROUGHPUT MASS SPECTROMETRY DATA , 2004 .

[11]  Mark R. Wade,et al.  Construction and Assessment of Classification Rules , 1999, Technometrics.

[12]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[13]  Bart J A Mertens,et al.  Microarrays, pattern recognition and exploratory data analysis , 2003, Statistics in medicine.

[14]  M. Stone,et al.  Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[15]  D M Berwick,et al.  Detection of colorectal cancer. , 1986, American family physician.

[16]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[17]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[18]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[19]  B. Mertens,et al.  Exact principal component influence measures applied to the analysis of spectroscopic data on rice , 2002 .

[20]  P. Eilers Parametric time warping. , 2004, Analytical chemistry.

[21]  Y. Yasui,et al.  An Automated Peak Identification/Calibration Procedure for High-Dimensional Protein Measures From Mass Spectrometers , 2003, Journal of biomedicine & biotechnology.

[22]  W. V. McCarthy,et al.  Discriminant Analysis with Singular Covariance Matrices: Methods and Applications to Spectroscopic Data , 1995 .

[23]  M. Forina,et al.  Multivariate calibration. , 2007, Journal of chromatography. A.

[24]  R. A. Fisher,et al.  Design of Experiments , 1936 .

[25]  D. Cox,et al.  The Theory of the Design of Experiments , 2000 .

[26]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[27]  V. Barnett,et al.  Applied Linear Statistical Models , 1975 .

[28]  B. J. A. Mertens,et al.  Downdating: Interdisciplinary Research Between Statistics and Computing , 2001 .