Dimensionality reduction and main component extraction of mass spectrometry cancer data

Mass spectrometry data have high dimensionality. Dimensionality reduction is a very important step to greatly improve the performance of distinguishing cancer tissue from normal tissue. In this study multilevel wavelet analysis is performed on high dimensional mass spectrometry data. A set of orthogonal wavelet basis of approximation coefficients is extracted to reduce dimensionality of mass spectra and represent main components of mass spectrometry data. The best level of wavelet decomposition of mass spectrometry data is selected based on energy distribution of approximation coefficients. Compared to traditional principal component analysis (PCA) method, which dependents on training samples to build feature space, our proposed method is using wavelet basis to extract main components of mass spectrometry, keeping local properties of data, and computing efficiently. Experiments are conducted on three datasets. The competitive performance is achieved compared to other methods of feature extraction and feature selection.

[1]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[2]  Yihui Liu,et al.  Feature Extraction for Mass Spectrometry Data , 2007, LSMS.

[3]  Bruce Randall Donald,et al.  Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum , 2003, J. Comput. Biol..

[4]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[5]  I. Daubechies Orthonormal bases of compactly supported wavelets , 1988 .

[6]  Yihui Liu,et al.  Wavelet feature extraction for high-dimensional microarray data , 2009, Neurocomputing.

[7]  Ilya Levner,et al.  Feature selection and nearest centroid classification for protein mass spectrometry , 2005, BMC Bioinformatics.

[8]  Yihui Liu,et al.  Feature extraction and dimensionality reduction for mass spectrometry data , 2009, Comput. Biol. Medicine.

[9]  Min Zhan,et al.  A data review and re-assessment of ovarian cancer serum proteomic profiling , 2003, BMC Bioinformatics.

[10]  Neal O. Jeffries,et al.  Performance of a genetic algorithm for mass spectrometry proteomics , 2004, BMC Bioinformatics.

[11]  Chih-Fong Tsai,et al.  Feature selection in bankruptcy prediction , 2009, Knowl. Based Syst..

[12]  M. Esmel ElAlami A filter model for feature subset selection based on genetic algorithm , 2009, Knowl. Based Syst..

[13]  Ming Zhou,et al.  Cancer diagnosis using proteomic patterns , 2003, Expert review of molecular diagnostics.

[14]  E. Petricoin,et al.  Clinical proteomics: translating benchside promise into bedside reality , 2002, Nature Reviews Drug Discovery.

[15]  Lance A Liotta,et al.  Genomics and proteomics: application of novel technology to early detection and prevention of cancer. , 2002, Cancer detection and prevention.

[16]  Zhen Liu,et al.  A new feature selection algorithm based on binomial hypothesis testing for spam filtering , 2011, Knowl. Based Syst..

[17]  Harun Uguz,et al.  A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm , 2011, Knowl. Based Syst..

[18]  Mukesh Verma,et al.  Proteomics for Cancer Biomarker Discovery , 2002 .

[19]  E. Petricoin,et al.  Serum proteomic patterns for detection of prostate cancer. , 2002, Journal of the National Cancer Institute.

[20]  Yihui Liu,et al.  Prominent feature selection of microarray data , 2009 .

[21]  Li Bai,et al.  Find Significant Gene Information Based on Changing Points of Microarray Data , 2009, IEEE Transactions on Biomedical Engineering.

[22]  Yumin Chen,et al.  A rough set approach to feature selection based on power set tree , 2011, Knowl. Based Syst..

[23]  Claudio Cobelli,et al.  Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data , 2005, Bioinform..

[24]  C. Burrus,et al.  Introduction to Wavelets and Wavelet Transforms: A Primer , 1997 .

[25]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .