An efficient two-stage gene selection method for microarray data

Gene selection is a key issue in the analysis of microarray data with small samples and variant correlation. The main objective of this paper is to select the most informative genes from thousands of genes with strong correlation. This is achieved by proposing an efficient two-stage gene selection (TSGS) algorithm. In this algorithm, the L 2-norm penalty are firstly introduced to achieve the grouping effect for the highly correlated genes. To overcome the small samples problem, the augmented data technique is then used to produce an augmented data set. Finally, by using the recently proposed two-stage algorithm, the most informative genes can be selected effectively. Simulation results confirm its effectiveness of the proposed approach in comparison with the popular Elastic Net method.