Application of multiple statistical tests to enhance mass spectrometry-based biomarker discovery

BackgroundMass spectrometry-based biomarker discovery has long been hampered by the difficulty in reconciling lists of discriminatory peaks identified by different laboratories for the same diseases studied. We describe a multi-statistical analysis procedure that combines several independent computational methods. This approach capitalizes on the strengths of each to analyze the same high-resolution mass spectral data set to discover consensus differential mass peaks that should be robust biomarkers for distinguishing between disease states.ResultsThe proposed methodology was applied to a pilot narcolepsy study using logistic regression, hierarchical clustering, t-test, and CART. Consensus, differential mass peaks with high predictive power were identified across three of the four statistical platforms. Based on the diagnostic accuracy measures investigated, the performance of the consensus-peak model was a compromise between logistic regression and CART, which produced better models than hierarchical clustering and t-test. However, consensus peaks confer a higher level of confidence in their ability to distinguish between disease states since they do not represent peaks that are a result of biases to a particular statistical algorithm. Instead, they were selected as differential across differing data distribution assumptions, demonstrating their true discriminatory potential.ConclusionThe methodology described here is applicable to any high-resolution MALDI mass spectrometry-derived data set with minimal mass drift which is essential for peak-to-peak comparison studies. Four statistical approaches with differing data distribution assumptions were applied to the same raw data set to obtain consensus peaks that were found to be statistically differential between the two groups compared. These consensus peaks demonstrated high diagnostic accuracy when used to form a predictive model as evaluated by receiver operating characteristics curve analysis. They should demonstrate a higher discriminatory ability as they are not biased to a particular algorithm. Thus, they are prime candidates for downstream identification and validation efforts.

[1]  David A Bennett,et al.  High-resolution serum proteomic profiling of Alzheimer disease samples reveals disease-specific, carrier-protein-bound mass signatures. , 2005, Clinical chemistry.

[2]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Michael Aldrich,et al.  Reduced Number of Hypocretin Neurons in Human Narcolepsy , 2000, Neuron.

[4]  Shinto Eguchi,et al.  Identification of biomarkers from mass spectrometry data using a "common" peak approach , 2006, BMC Bioinformatics.

[5]  T. Young,et al.  Correlates of sleep-onset REM periods during the Multiple Sleep Latency Test in community adults. , 2006, Brain : a journal of neurology.

[6]  Kevin P. Rosenblatt,et al.  A Robust Biomarker Discovery Pipeline for High-Performance Mass Spectrometry Data , 2007, J. Bioinform. Comput. Biol..

[7]  E. Shtatland,et al.  THE PERILS OF STEPWISE LOGISTIC REGRESSION AND HOW TO ESCAPE THEM USING INFORMATION CRITERIA AND THE OUTPUT DELIVERY SYSTEM , 2001 .

[8]  David Ward,et al.  Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data , 2003, Bioinform..

[9]  J. Albrethsen Reproducibility in protein profiling by MALDI-TOF mass spectrometry. , 2007, Clinical chemistry.

[10]  Chong Ho Yu,et al.  An Overview of Remedial Tools for Collinearity in SAS , 2000 .

[11]  A. Pothen,et al.  Protocols for disease classification from mass spectrometry data , 2003, Proteomics.

[12]  G. Li,et al.  An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers , 2002, Bioinform..

[13]  M. Girolami,et al.  Clinical proteomics: A need to define the field and to begin to set adequate standards , 2007, Proteomics. Clinical applications.

[14]  Habtom W. Ressom,et al.  Analysis of mass spectral serum profiles for biomarker selection , 2005, Bioinform..

[15]  G. Hortin Can mass spectrometric protein profiling meet desired standards of clinical laboratory practice? , 2005, Clinical chemistry.

[16]  Richard Baumgartner,et al.  Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions , 2003, Bioinform..

[17]  Jean-François Hocquette,et al.  Assessment of hierarchical clustering methodologies for proteomic data mining. , 2007, Journal of proteome research.

[18]  H. Otu,et al.  Optimization and evaluation of surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) with reversed-phase protein arrays for protein profiling , 2005, Clinical chemistry and laboratory medicine.

[19]  J. Siegel,et al.  Narp immunostaining of human hypocretin (orexin) neurons , 2005, Neurology.

[20]  Susmita Datta,et al.  Comparisons and validation of statistical clustering techniques for microarray gene expression data , 2003, Bioinform..

[21]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[22]  E. Mignot,et al.  Narcolepsy with cataplexy , 2007, The Lancet.