Feature selection for microarray data by AUC analysis

Microarray datasets are often limited to a small number of samples with a large number of gene expressions. Therefore, dimensionality reduction through a feature/gene selection process is highly important for classification purposes. In this paper, a feature perturbation method we previously introduced is applied to do gene selection from microarray data. A publicly available colon cancer dataset is used in our experiments. In comparison with SVM-RFE, our method is better with feature sets of between 10 and 80, however for less than 10 features SVM-RFE results in higher accuracy. An analysis of the area under the curve of the feature perturbation method for the top 50 and 25 features is performed, aiming to determine the proper amount of noise to be applied. We show that a good set of small features/genes can be found using the feature perturbation method.