A Feature Selection Method based on SVM and ReliefF and its Application in the Analysis of HPLC-MS Data
暂无分享,去创建一个
Liquid chromatography-mass spectrometry (HPLC-MS) has shown its power in
metabolomic study. Due to the high dimension of the HPLC-MS data, many
multivariate analysis techniques, such as principal component analysis, partial
least-squares discriminant analysis, random forest and support vector machine, have
been applied in processing the HPLC-MS data.
Support vector machine (SVM) [1] is a very popular classification method based on the
statistic theory. In constructing the learning model, it also measures the weights of the
variables. But the HPLC-MS data usually contains hundreds of variables, some of them
are non-related with the problem which may affect the produced super-plane, further
influences the variable weights. To select the most informative ones from the
HPLC-MS data, we combine SVM with ReliefF [2] to conduct the recursive feature
elimination (SVM-RFE-ReliefF). In each loop, the SVM weights and the ReliefF
values are both computed, a proportion of the low ranked features by the two
measurements are deleted. A metabonomics data of liver diseases from UPLC/Q-TOF
MS platform, which contains 2428 ion features and 60 samples including 30 cirrhosis
patients, 30 HCC patients was used to show the performance of our method. In order to
validate the selected features, 30 control samples were also collected. The results
showed that the accuracy rate of our method in distinguishing HCC from cirrhosis is
98.17%±0.95%, which is better than 97.5%±1.62% from SVM-recursive feature
elimination (SVM-RFE), This implies that our method could select more
discriminative features than SVM-RFE.