A Modified SVM Method for Analyzing Metabonomics Data from HPLC-MS

Liquid chromatography-mass spectrometry (HPLC-MS) is an effective analytical technique which has been used in many applications, such as proteomics and metabolomics. Since the data produced by HPLC-MS usually contain hundreds (or even more) of variables including noisy and nonrelated information, selecting meaningful information from the data becomes quite critic. Support vector machine recursive feature elimination (SVM-RFE) is a very popular feature selection technique which is based on support vector machine (SVM). It has been successfully applied in analyzing biological data. In SVM-RFE, Filter-out-Factor (m), the number of the bottom ranked features to be deleted in each loop, can influence the performance of the algorithm. Different m results in the different selected feature subsets, hence the performances of the corresponding SVM classification models are quite different. In order to produce a stable result in processing high dimensional HPLC-MS data, we proposed an improved SVM-RFE method based on the dynamic Filter-out-Factor (SVM-RFE-DFF). In each loop, only the features lying in a specific window and having no contribution to improving the classification performance are eliminated. To show the usefulness of our new SVM-RFEDFF method we applied it to process metabonomics data of metabolic syndrome and liver diseases from UPLC/Q-TOF MS platform. Results showed that the SVM-RFE-DFF outperforms SVM-RFE in discriminating the patients from healthy controls.