A Support Vector Machine Classifier based on Recursive Feature Elimination for Microarray Data in Breast Cancer Characterization

An effective approach to cancer classification based upon gene expression monitoring using DNA microarray was introduced by [1]. Here they used DNA microarray analysis on primary breast tumours of 78 young patients without tumour cells in local lymph nodes at diagnosis, 34 from patients who developed distant metastasis within 5 years (poor prognosis group), 44 from patients who continued to be disease-free after a period of at least 5 years (good prognosis group) and applied a three step supervised classification (based on correlation methods) to identify a gene expression signature strongly predictive of a short interval to distant metastasis (“poor prognosis” signature). We use a Support Vector Machine (SVM) to face the same problem because such a method has already done well in cancer classification problems, and we think that we could obtain slightly better results. In addition we also address the problem of selection of a small subset of genes from the initial number of genes (~25000). We use a method of gene selection utilising SVM methods based on Recursive Feature Elimination (RFE) instead of the feature ranking with correlation method used in [1], because the last method doesn’t take into account mutual information between features in the feature selection process, and this could impact classification performance [2].