Research on Methodology of Classification Mining for Tumor Markers

Reliability is one of the key issues in data mining. In the case of massive protein mass spectrum data from SELDI-TOF-MS, this paper proposes an effective and reliable method to extract tumor markers. First of all, an adaptive threshold approach based on wavelet transformation is put forward to eliminate the noise in raw data so as to furnish reliable foundation for tumor markers extraction. Then a kind of genetic algorithm based on SVM is designed to construct discriminating model in order to find the optimal combination of distinct protein peaks and obtain tumor markers. Finally, the method proposed in this paper is applied to extract tumor markers from the protein mass spectrum data that come from normal mouse serums and induced pancreatic cancer mouse serums to verify the feasibility and reliability of our method.