Data mining techniques for cancer detection using serum proteomic profiling

OBJECTIVE Pathological changes in an organ or tissue may be reflected in proteomic patterns in serum. It is possible that unique serum proteomic patterns could be used to discriminate cancer samples from non-cancer ones. Due to the complexity of proteomic profiling, a higher order analysis such as data mining is needed to uncover the differences in complex proteomic patterns. The objectives of this paper are (1) to briefly review the application of data mining techniques in proteomics for cancer detection/diagnosis; (2) to explore a novel analytic method with different feature selection methods; (3) to compare the results obtained on different datasets and that reported by Petricoin et al. in terms of detection performance and selected proteomic patterns. METHODS AND MATERIAL Three serum SELDI MS data sets were used in this research to identify serum proteomic patterns that distinguish the serum of ovarian cancer cases from non-cancer controls. A support vector machine-based method is applied in this study, in which statistical testing and genetic algorithm-based methods are used for feature selection respectively. Leave-one-out cross validation with receiver operating characteristic (ROC) curve is used for evaluation and comparison of cancer detection performance. RESULTS AND CONCLUSIONS The results showed that (1) data mining techniques can be successfully applied to ovarian cancer detection with a reasonably high performance; (2) the classification using features selected by the genetic algorithm consistently outperformed those selected by statistical testing in terms of accuracy and robustness; (3) the discriminatory features (proteomic patterns) can be very different from one selection method to another. In other words, the pattern selection and its classification efficiency are highly classifier dependent. Therefore, when using data mining techniques, the discrimination of cancer from normal does not depend solely upon the identity and origination of cancer-related proteins.

[1]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[2]  C. Paweletz,et al.  New approaches to proteomic analysis of breast cancer , 2001, Proteomics.

[3]  T. Poon,et al.  Proteome analysis and its impact on the discovery of serological tumor markers. , 2001, Clinica chimica acta; international journal of clinical chemistry.

[4]  E. Petricoin,et al.  Serum proteomic patterns for detection of prostate cancer. , 2002, Journal of the National Cancer Institute.

[5]  E. Sauter,et al.  Proteomic analysis of nipple aspirate fluid to detect biologic markers of breast cancer , 2002, British Journal of Cancer.

[6]  E. Petricoin,et al.  Early detection: Proteomic applications for the early detection of cancer , 2003, Nature Reviews Cancer.

[7]  F. Tse,et al.  Biological mass spectrometry: a primer. , 2000, Mutagenesis.

[8]  R. Nelson,et al.  Mass spectrometry of the proteome. , 2001, Molecular pharmacology.

[9]  A. Stieg,et al.  Detection of early-stage cancer by serum protein analysis , 2001 .

[10]  G. Auer,et al.  Cancer proteomics: From identification of novel markers to creation of artifical learning models for tumor classification , 2000, Electrophoresis.

[11]  Jean-Charles Sanchez,et al.  Proteomics: new perspectives, new biomedical opportunities , 2000, The Lancet.

[12]  P. Schellhammer,et al.  Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. , 2002, Cancer research.

[13]  G I Murray,et al.  Proteomics: a new approach to the study of disease , 2000, The Journal of pathology.

[14]  D. Chan,et al.  Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. , 2002, Clinical chemistry.

[15]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[16]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[17]  P. Schellhammer,et al.  Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. , 2002, Clinical chemistry.

[18]  R. Bast,et al.  Toward an optimal algorithm for ovarian cancer screening with longitudinal tumor markers , 1995, Cancer.

[19]  T. Ørntoft,et al.  Gene expression profiling: monitoring transcription and translation products using DNA microarrays and proteomics , 2000, FEBS letters.

[20]  A. Vlahou,et al.  Proteomic approaches to biomarker discovery in prostate and bladder cancers , 2001, Proteomics.

[21]  E. Petricoin,et al.  New technologies for biomarker analysis of prostate cancer progression: Laser capture microdissection and tissue proteomics. , 2001, Urology.

[22]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian Cancer , 2002 .

[23]  H R Schmid,et al.  Lung tumor cells: A multivariate approach to cell classification using two‐dimensional protein pattern , 1995, Electrophoresis.

[24]  J. Yates,et al.  Shotgun Proteomics and Biomarker Discovery , 2002, Disease markers.

[25]  Bjarte Dysvik,et al.  Molecular classification of borderline ovarian tumors using hierarchical cluster analysis of protein expression profiles , 2002, International journal of cancer.

[26]  A Tiengo,et al.  Serum protein profiles of patients with pancreatic cancer and chronic pancreatitis: searching for a diagnostic protein pattern. , 2001, Rapid communications in mass spectrometry : RCM.

[27]  M. Waltham,et al.  Identification of gel‐separated tumor marker proteins by mass spectrometry , 2000, Electrophoresis.

[28]  Adam Z. Stieg,et al.  Mass Spectroscopy as a Discovery Tool for Identifying Serum Markers for Prostate Cancer , 2001 .

[29]  R. Bast,et al.  Elevation of multiple serum markers in patients with stage I ovarian cancer. , 1993, Journal of the National Cancer Institute.

[30]  S Hanash,et al.  Proteomics in early detection of cancer. , 2001, Clinical chemistry.

[31]  G. Li,et al.  An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers , 2002, Bioinform..

[32]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[33]  J R Yates,et al.  Mass spectrometry. From genomics to proteomics. , 2000, Trends in genetics : TIG.

[34]  Eugene Fink,et al.  Diagnosis of ovarian cancer based on mass spectra of blood samples , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[35]  T. Yip,et al.  Comprehensive proteomic profiling identifies serum proteomic signatures for detection of hepatocellular carcinoma and its subtypes. , 2003, Clinical chemistry.

[36]  O John Semmes,et al.  Normal, benign, preneoplastic, and malignant prostate cells have distinct protein expression profiles resolved by surface enhanced laser desorption/ionization mass spectrometry. , 2002, Clinical cancer research : an official journal of the American Association for Cancer Research.

[37]  E. Kohn,et al.  Proteomic analysis and identification of new biomarkers and therapeutic targets for invasive ovarian cancer , 2002, Proteomics.

[38]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..