Feature Extraction for Classification of Proteomic Mass Spectra: A Comparative Study

To satisfy the ever growing need for effective screening and diagnostic tests, medical practitioners have turned their attention to high resolution, high throughput methods. One approach is to use mass spectrometry based methods for disease diagnosis. Effective diagnosis is achieved by classifying the mass spectra as belonging to healthy or diseased individuals. Unfortunately, the high resolution mass spectrometry data contains a large degree of noisy, redundant and irrelevant information, making accurate classification difficult. To overcome these obstacles, feature extraction methods are used to select or create small sets of relevant features. This paper compares existing feature selection methods to a novel wrapper-based feature selection and centroid-based classification method. A key contribution is the exposition of different feature extraction techniques, which encompass dimensionality reduction and feature selection methods. The experiments, on two cancer data sets, indicate that feature selection algorithms tend to both reduce data dimensionality and increase classification accuracy, while the dimensionality reduction techniques sacrifice performance as a result of lowering the number of features. In order to evaluate the dimensionality reduction and feature selection techniques, we use a simple classifier, thereby making the approach tractable. In relation to previous research, the proposed algorithm is very competitive in terms of (i) classification accuracy, (ii) size of feature sets, (iii) usage of computational resources during both training and classification phases.

[1]  David Ward,et al.  Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data , 2003, Bioinform..

[2]  R. Aebersold,et al.  Proteomics: the first decade and beyond , 2003, Nature Genetics.

[3]  Trevor Hastie,et al.  Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays , 2003 .

[4]  Jill P. Mesirov,et al.  Computational Biology , 2018, Encyclopedia of Parallel Computing.

[5]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian Cancer , 2002 .

[6]  Ilya Levner Proteomic Pattern Recognition , 2004 .

[7]  P. Schellhammer,et al.  Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. , 2002, Clinical chemistry.

[8]  Desmond J. Higham,et al.  On the Boundedness of Asymptotic Stability Regions for the Stochastic Theta Method , 2003 .

[9]  Bruce Randall Donald,et al.  Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum , 2003, J. Comput. Biol..

[10]  E. Petricoin,et al.  Serum proteomic patterns for detection of prostate cancer. , 2002, Journal of the National Cancer Institute.

[11]  E. Diamandis Point: Proteomic patterns in biological fluids: do they represent the future of cancer diagnostics? , 2003, Clinical chemistry.

[12]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[13]  Ming Zhou,et al.  Cancer diagnosis using proteomic patterns , 2003, Expert review of molecular diagnostics.

[14]  C Kainz [Early detection and preoperative diagnosis of ovarian carcinoma]. , 1996, Wiener medizinische Wochenschrift.

[15]  Robert Tibshirani,et al.  Sample classification from protein mass spectrometry, by 'peak probability contrasts' , 2004, Bioinform..

[16]  J. B. Rosen,et al.  Lower Dimensional Representation of Text Data Based on Centroids and Least Squares , 2003 .

[17]  William H. Press,et al.  Numerical recipes in C , 2002 .

[18]  Yachen Lin,et al.  Geometric Data Analysis: An Empirical Approach to Dimensionality Reduction and the Study of Patterns , 2002, Technometrics.

[19]  P. Schellhammer,et al.  Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. , 2002, Cancer research.

[20]  E. Petricoin,et al.  Early detection: Proteomic applications for the early detection of cancer , 2003, Nature Reviews Cancer.