Proteomic Biomarker Identification for Diagnosis of Early Relapse in Ovarian Cancer

Ovarian cancer recurs at the rate of 75% within a few months or several years later after therapy. Early recurrence, though responding better to treatment, is difficult to detect. Surface-enhanced laser desorption/ionization time-of-flight (SELDI-TOF) mass spectrometry has showed the potential to accurately identify disease biomarkers to help early diagnosis. A major challenge in the interpretation of SELDI-TOF data is the high dimensionality of the feature space. To tackle this problem, we have developed a multi-step data processing method composed of t-test, binning and backward feature selection. A new algorithm, support vector machine-Markov blanket/recursive feature elimination (SVM-MB/RFE) is presented for the backward feature selection. This method is an integration of minimum weight feature elimination by SVM-RFE and information theory based redundant/irrelevant feature removal by Markov Blanket. Subsequently, SVM was used for classification. We conducted the biomarker selection algorithm on 113 serum samples to identify early relapse from ovarian cancer patients after primary therapy. To validate the performance of the proposed algorithm, experiments were carried out in comparison with several other feature selection and classification algorithms.

[1]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[2]  Bruce Randall Donald,et al.  Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum , 2003, J. Comput. Biol..

[3]  J. Glimm,et al.  Detection of cancer-specific markers amid massive mass spectral data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[5]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[6]  D. Chan,et al.  Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. , 2002, Clinical chemistry.

[7]  Richard M. Karp,et al.  CLIFF: clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts , 2001, ISMB.

[8]  Hua Lin,et al.  Quantifying reproducibility for differential proteomics: noise analysis for protein liquid chromatography-mass spectrometry of human serum , 2004, Bioinform..

[9]  Paul Terry,et al.  Application of the GA/KNN method to SELDI proteomics data , 2004, Bioinform..

[10]  Marshall W. Bern,et al.  Automatic Quality Assessment of Peptide Tandem Mass Spectra , 2004, ISMB/ECCB.

[11]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  Bernard De Baets,et al.  Feature subset selection for splice site prediction , 2002, ECCB.

[14]  George L. Wright,et al.  SELDI proteinchip MS: a platform for biomarker discovery and cancer diagnosis. , 2002, Expert review of molecular diagnostics.

[15]  David Ward,et al.  Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data , 2003, Bioinform..

[16]  R. Bast,et al.  Status of tumor markers in ovarian cancer screening. , 2003, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[17]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[18]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[19]  P. Schellhammer,et al.  Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. , 2002, Cancer research.

[20]  E. Petricoin,et al.  Early detection: Proteomic applications for the early detection of cancer , 2003, Nature Reviews Cancer.

[21]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[22]  Emanuel F Petricoin,et al.  Serum proteomics in cancer diagnosis and management. , 2004, Annual review of medicine.

[23]  P. Schellhammer,et al.  Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. , 2002, Clinical chemistry.

[24]  Fabian Model,et al.  Feature selection for DNA methylation based cancer classification , 2001, ISMB.

[25]  Melanie Hilario,et al.  Machine learning approaches to lung cancer prediction from mass spectra , 2003, Proteomics.

[26]  V. Torri,et al.  Paclitaxel plus platinum-based chemotherapy versus conventional platinum-based chemotherapy in women with relapsed ovarian cancer: the ICON4/AGO-OVAR-2.2 trial , 2003, The Lancet.

[27]  S Hanash,et al.  Proteomics in early detection of cancer. , 2001, Clinical chemistry.

[28]  G. Li,et al.  An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers , 2002, Bioinform..

[29]  E. Dalmasso,et al.  SELDI ProteinChip® Array Technology: Protein-Based Predictive Medicine and Drug Discovery Applications , 2003, Journal of biomedicine & biotechnology.