Nonnegative principal component analysis for mass spectral serum profiles and biomarker discovery

BackgroundAs a novel cancer diagnostic paradigm, mass spectroscopic serum proteomic pattern diagnostics was reported superior to the conventional serologic cancer biomarkers. However, its clinical use is not fully validated yet. An important factor to prevent this young technology to become a mainstream cancer diagnostic paradigm is that robustly identifying cancer molecular patterns from high-dimensional protein expression data is still a challenge in machine learning and oncology research. As a well-established dimension reduction technique, PCA is widely integrated in pattern recognition analysis to discover cancer molecular patterns. However, its global feature selection mechanism prevents it from capturing local features. This may lead to difficulty in achieving high-performance proteomic pattern discovery, because only features interpreting global data behavior are used to train a learning machine.MethodsIn this study, we develop a nonnegative principal component analysis algorithm and present a nonnegative principal component analysis based support vector machine algorithm with sparse coding to conduct a high-performance proteomic pattern classification. Moreover, we also propose a nonnegative principal component analysis based filter-wrapper biomarker capturing algorithm for mass spectral serum profiles.ResultsWe demonstrate the superiority of the proposed algorithm by comparison with six peer algorithms on four benchmark datasets. Moreover, we illustrate that nonnegative principal component analysis can be effectively used to capture meaningful biomarkers.ConclusionOur analysis suggests that nonnegative principal component analysis effectively conduct local feature selection for mass spectral profiles and contribute to improving sensitivities and specificities in the following classification, and meaningful biomarker discovery.

[1]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[2]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[3]  E. Petricoin,et al.  SELDI-TOF-based serum proteomic pattern diagnostics for early detection of cancer. , 2004, Current opinion in biotechnology.

[4]  Habtom W. Ressom,et al.  Analysis of mass spectral serum profiles for biomarker selection , 2005, Bioinform..

[5]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[6]  Milos Hauskrecht,et al.  Feature Selection for Classification of SELDI-TOF-MS Proteomic Profiles , 2005, Applied bioinformatics.

[7]  Xiaoxu Han,et al.  Nonnegative Principal Component Analysis for Cancer Molecular Pattern Discovery , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[8]  Xiaoxu Han,et al.  Protein Expression Molecular Pattern Discovery by Nonnegative Principal Component Analysis , 2008, PRIB.

[9]  W. Johnson,et al.  The Bayesian Two-Sample t Test , 2005 .

[10]  Bruce Randall Donald,et al.  Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum , 2003, J. Comput. Biol..

[11]  Jeffrey S. Morris,et al.  Serum proteomics profiling—a young technology begins to mature , 2005, Nature Biotechnology.

[12]  Amnon Shashua,et al.  Nonnegative Sparse PCA , 2006, NIPS.

[13]  Claudio Cobelli,et al.  Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data , 2005, Bioinform..

[14]  Bart J. A. Mertens,et al.  Biomarker discovery in MALDI-TOF serum protein profiles using discrete wavelet transformation , 2009, Bioinform..

[15]  Dante Mantini,et al.  Independent component analysis for the extraction of reliable protein signal profiles from MALDI-TOF mass spectra , 2008, Bioinform..