An intensity-region driven multi-classifier scheme for improving the classification accuracy of proteomic MS-spectra

In this study, a pattern recognition system is presented for improving the classification accuracy of MS-spectra by means of gathering information from different MS-spectra intensity regions using a majority vote ensemble combination. The method starts by automatically breaking down all MS-spectra into common intensity regions. Subsequently, the most informative features (m/z values), which might constitute potential significant biomarkers, are extracted from each common intensity region over all the MS-spectra and, finally, normal from ovarian cancer MS-spectra are discriminated using a multi-classifier scheme, with members the Support Vector Machine, the Probabilistic Neural Network and the k-Nearest Neighbour classifiers. Clinical material was obtained from the publicly available ovarian proteomic dataset (8-7-02). To ensure robust and reliable estimates, the proposed pattern recognition system was evaluated using an external cross-validation process. The average overall performance of the system in discriminating normal from cancer ovarian MS-spectra was 97.18% with 98.52% mean sensitivity and 94.84% mean specificity values.

[1]  Neal O. Jeffries,et al.  Performance of a genetic algorithm for mass spectrometry proteomics , 2004, BMC Bioinformatics.

[2]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian Cancer , 2002 .

[3]  Usha Menon,et al.  Progress and Challenges in Screening for Early Detection of Ovarian Cancer* , 2004, Molecular & Cellular Proteomics.

[4]  Cesare Furlanello,et al.  Proteome Profiling without Selection Bias , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[5]  Wei Zhu,et al.  Feature extraction in the analysis of proteomic mass spectra , 2006, Proteomics.

[6]  Huiqing Liu,et al.  Discovery of significant rules for classifying cancer diagnosis data , 2003, ECCB.

[7]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[8]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[9]  Jeffrey S. Morris,et al.  Bias, Randomization, and Ovarian Proteomic Data: A Reply to “Producers and Consumers” , 2005, Cancer informatics.

[10]  J. Glimm,et al.  Detection of cancer-specific markers amid massive mass spectral data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[12]  Huiqing Liu,et al.  A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. , 2002, Genome informatics. International Conference on Genome Informatics.

[13]  Elias S. Manolakos,et al.  Signal Background Estimation and Baseline Correction Algorithms for Accurate DNA Sequencing , 2003, J. VLSI Signal Process..

[14]  Habtom W. Ressom,et al.  Analysis of mass spectral serum profiles for biomarker selection , 2005, Bioinform..

[15]  Peter L. Hammer,et al.  Pattern-based feature selection in genomics and proteomics , 2006, Ann. Oper. Res..

[16]  Jeffrey S. Morris,et al.  Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments , 2004, Bioinform..

[17]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[18]  Ying Liu,et al.  Serum Proteomic Pattern Analysis for Early Cancer Detection , 2005, Advances in Bioinformatics and Its Applications.

[19]  D. Ward,et al.  Diagnostic Markers for Early Detection of Ovarian Cancer , 2008, Clinical Cancer Research.

[20]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[21]  Jeffrey S. Morris,et al.  Bias, Randomization, and Ovarian Proteomic Data: A Reply to “Producers and Consumers” , 2005 .

[22]  P. Hammer,et al.  Ovarian cancer detection by logical analysis of proteomic data , 2004, Proteomics.

[23]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Jeffrey S. Morris,et al.  Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum , 2005, Bioinform..

[25]  Jeffrey S. Morris,et al.  Improved peak detection and quantification of mass spectrometry data acquired from surface‐enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform , 2005, Proteomics.

[26]  Shinto Eguchi,et al.  Identification of biomarkers from mass spectrometry data using a "common" peak approach , 2006, BMC Bioinformatics.

[27]  Elena Marchiori,et al.  Feature selection in proteomic pattern data with support vector machines , 2004, 2004 Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[28]  Hong Tang,et al.  Data mining techniques for cancer detection using serum proteomic profiling , 2004, Artif. Intell. Medicine.

[29]  Min Zhan,et al.  A data review and re-assessment of ovarian cancer serum proteomic profiling , 2003, BMC Bioinformatics.

[30]  V. Zurawski,et al.  Elevated serum CA 125 levels prior to diagnosis of ovarian neoplasia: Relevance for early detection of ovarian cancer , 1988, International journal of cancer.

[31]  J. Potter,et al.  A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. , 2003, Biostatistics.

[32]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.