Mass spectrometry is an important technique for chemical profiling and is a major tool in proteomics, a discipline interested in large-scale studies of proteins expressed by an organism. In this paper we propose using a sparse coding algorithm for classification of mass spectrometry serum protein profiles of colorectal cancer patients and healthy individuals following the so-called self-taught learning approach. Being applied to the dataset of 112 spectra of length 4731 bins, the sparse coding algorithm represents each of them by means of less then ten prototype spectra. The classification of spectra is done as in our previous study on the same dataset [ADM09], using Support Vector Machines evaluated by means of the double cross-validation. However, the classifiers take as input not discrete wavelet coefficients but the sparse coding coefficients. Comparing the classification results with reference results, we show that providing the same total recognition rate, the sparse coding-based procedure leads to higher generalization performance. Moreover, we propose using the sparse coding coefficients for clustering of mass spectra and demonstrate that this approach allows one to highlight differences between the cancer spectra.
[1]
Rajat Raina,et al.
Self-taught learning: transfer learning from unlabeled data
,
2007,
ICML '07.
[2]
D. Lorenz,et al.
An active set approach to the elastic-net and its applications in mass spectrometry
,
2009
.
[3]
Theodore Alexandrov,et al.
SparseCodePicking: feature extraction in mass spectrometry using sparse coding algorithms
,
2009,
0907.3426.
[4]
Bart J. A. Mertens,et al.
Biomarker discovery in MALDI-TOF serum protein profiles using discrete wavelet transformation
,
2009,
Bioinform..
[5]
Cordelia Schmid,et al.
High-dimensional data clustering
,
2006,
Comput. Stat. Data Anal..
[6]
Cornelis J H van de Velde,et al.
Detection of colorectal cancer using MALDI-TOF serum protein profiling.
,
2006,
European journal of cancer.
[7]
John Shawe-Taylor,et al.
Generalization Performance of Support Vector Machines and Other Pattern Classifiers
,
1999
.
[8]
Rajat Raina,et al.
Efficient sparse coding algorithms
,
2006,
NIPS.