Identifying Biomarkers from Mass Spectrometry Data with Ordinal Outcome

Summary: In recent years, there has been an increased interest in using protein mass spectroscopy to identify molecular markers that discriminate diseased from healthy individuals. Existing methods are tailored towards classifying observations into nominal categories. Sometimes, however, the outcome of interest may be measured on an ordered scale. Ignoring this natural ordering results in some loss of information. In this paper, we propose a Bayesian model for the analysis of mass spectrometry data with ordered outcome. The method provides a unified approach for identifying relevant markers and predicting class membership. This is accomplished by building a stochastic search variable selection method within an ordinal outcome model. We apply the methodology to mass spectrometry data on ovarian cancer cases and healthy individuals. We also utilize wavelet-based techniques to remove noise from the mass spectra prior to analysis. We identify protein markers associated with being healthy, having low grade ovarian cancer, or being a high grade case. For comparison, we repeated the analysis using conventional classification procedures and found improved predictive accuracy with our method.

[1]  R. Bast,et al.  Three Biomarkers Identified from Serum Proteomic Analysis for the Detection of Early Stage Ovarian Cancer , 2004, Cancer Research.

[2]  D. Fishman,et al.  Three-dimensional power Doppler ultrasound improves the diagnostic accuracy for ovarian cancer prediction. , 2001, Gynecologic oncology.

[3]  Wei Chu,et al.  Biomarker discovery in microarray gene expression data with Gaussian processes , 2005, Bioinform..

[4]  D. Lindley A STATISTICAL PARADOX , 1957 .

[5]  P. Schellhammer,et al.  Data Reduction Using a Discrete Wavelet Transform in Discriminant Analysis of Very High Dimensionality Data , 2003, Biometrics.

[6]  Annette M. Molinaro,et al.  Prediction error estimation: a comparison of resampling methods , 2005, Bioinform..

[7]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[8]  Jeffrey S. Morris,et al.  Quality control and peak finding for proteomics data collected from nipple aspirate fluid by surface-enhanced laser desorption and ionization. , 2003, Clinical chemistry.

[9]  D. Corle,et al.  Effect of long-term freezer storage, thawing, and refreezing on selected constituents of serum. , 1989, Mayo Clinic proceedings.

[10]  T. Cai Adaptive wavelet estimation : A block thresholding and oracle inequality approach , 1999 .

[11]  Xiao-Ying Meng,et al.  Classification of cancer types by measuring variants of host response proteins using SELDI serum assays , 2005, International journal of cancer.

[12]  Alan E. Gelfand,et al.  Model Determination using sampling-based methods , 1996 .

[13]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[14]  M. McKay,et al.  Cancer of the ovary. , 1994, The New England journal of medicine.

[15]  Lori J Sokoll,et al.  Quantification of fragments of human serum inter-alpha-trypsin inhibitor heavy chain 4 by a surface-enhanced laser desorption/ionization-based immunoassay. , 2006, Clinical chemistry.

[16]  Jeffrey S. Morris,et al.  Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum , 2005, Bioinform..

[17]  Michael R. Chernick,et al.  Wavelet Methods for Time Series Analysis , 2001, Technometrics.

[18]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[19]  T. Fearn,et al.  Bayes model averaging with selection of regressors , 2002 .

[20]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[21]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[22]  T W Randolph,et al.  Multiscale Processing of Mass Spectrometry Data , 2006, Biometrics.

[23]  Marina Vannucci,et al.  Bayesian Variable Selection in Multinomial Probit Models to Identify Molecular Signatures of Disease Stage , 2004, Biometrics.

[24]  E. Fung,et al.  Evaluation of Apolipoprotein A1 and Posttranslationally Modified Forms of Transthyretin as Biomarkers for Ovarian Cancer Detection in an Independent Study Population , 2006, Cancer Epidemiology Biomarkers & Prevention.