Feature extraction and dimensionality reduction for mass spectrometry data

Mass spectrometry is being used to generate protein profiles from human serum, and proteomic data obtained from mass spectrometry have attracted great interest for the detection of early stage cancer. However, high dimensional mass spectrometry data cause considerable challenges. In this paper we propose a feature extraction algorithm based on wavelet analysis for high dimensional mass spectrometry data. A set of wavelet detail coefficients at different scale is used to detect the transient changes of mass spectrometry data. The experiments are performed on 2 datasets. A highly competitive accuracy, compared with the best performance of other kinds of classification models, is achieved. Experimental results show that the wavelet detail coefficients are efficient way to characterize features of high dimensional mass spectra and reduce the dimensionality of high dimensional mass spectra.

[1]  Lance A. Liotta,et al.  Cancer Proteomics: The State of the Art , 2002, Disease markers.

[2]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  David Ward,et al.  Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data , 2003, Bioinform..

[4]  Min Zhan,et al.  A data review and re-assessment of ovarian cancer serum proteomic profiling , 2003, BMC Bioinformatics.

[5]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[6]  E. Petricoin,et al.  Serum proteomic patterns for detection of prostate cancer. , 2002, Journal of the National Cancer Institute.

[7]  Claudio Cobelli,et al.  Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data , 2005, Bioinform..

[8]  R.B. Lake,et al.  Programs for digital signal processing , 1981, Proceedings of the IEEE.

[9]  A. Grossmann,et al.  DECOMPOSITION OF HARDY FUNCTIONS INTO SQUARE INTEGRABLE WAVELETS OF CONSTANT SHAPE , 1984 .

[10]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[11]  Desmond J. Higham,et al.  On the Boundedness of Asymptotic Stability Regions for the Stochastic Theta Method , 2003 .

[12]  Ilya Levner,et al.  Feature selection and nearest centroid classification for protein mass spectrometry , 2005, BMC Bioinformatics.

[13]  E. Petricoin,et al.  Clinical proteomics: translating benchside promise into bedside reality , 2002, Nature Reviews Drug Discovery.

[14]  G. Wright,et al.  Development of a novel proteomic approach for the detection of transitional cell carcinoma of the bladder in urine. , 2001, The American journal of pathology.

[15]  J. B. Rosen,et al.  Lower Dimensional Representation of Text Data Based on Centroids and Least Squares , 2003 .

[16]  Bruce Randall Donald,et al.  Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum , 2003, J. Comput. Biol..

[17]  I. Daubechies Orthonormal bases of compactly supported wavelets , 1988 .

[18]  M. Verma,et al.  Proteomics for cancer biomarker discovery. , 2002, Clinical chemistry.

[19]  Lance A Liotta,et al.  Genomics and proteomics: application of novel technology to early detection and prevention of cancer. , 2002, Cancer detection and prevention.

[20]  Neal O. Jeffries,et al.  Performance of a genetic algorithm for mass spectrometry proteomics , 2004, BMC Bioinformatics.

[21]  G. Wright,et al.  Proteinchip® surface enhanced laser desorption/ionization (SELDI) mass spectrometry: a novel protein biochip technology for detection of prostate cancer biomarkers in complex protein mixtures , 1999, Prostate Cancer and Prostatic Diseases.