Two-Step Gene Feature Selection Algorithm Based on Permutation Test

In order to filter noisy and redundant genes, this paper presents a two-step gene feature selection algorithm based on permutation Test. The proposed algorithm can select genes efficiently and process large dataset quickly due to the permutation test technique. Twelve datasets of RSCTC 2010 Discovery Challenge and two famous classifiers SVM and PAM are adopted to evaluate the performance of the proposed algorithm. The experiment results show that the small gene subset with high discriminant and low redundancy can be selected efficiently by the proposed algorithm.

[1]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Lloyd A. Smith,et al.  Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper , 1999, FLAIRS.

[4]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[6]  Sadaaki Miyamoto,et al.  Rough Sets and Current Trends in Computing , 2012, Lecture Notes in Computer Science.

[7]  Walter L. Ruzzo,et al.  Improved Gene Selection for Classification of Microarrays , 2002, Pacific Symposium on Biocomputing.

[8]  Marti J. Anderson,et al.  Permutation tests for univariate or multivariate analysis of variance and regression , 2001 .

[9]  Piotr Artiemjew,et al.  The Extraction Method of DNA Microarray Features Based on Experimental A Statistics , 2011, RSKT.

[10]  Wentian Li,et al.  How Many Genes are Needed for a Discriminant Microarray Data Analysis , 2001, physics/0104029.

[11]  Guoyin Wang,et al.  RSCTC'2010 Discovery Challenge: Mining DNA Microarray Data for Medical Diagnosis and Treatment , 2010, RSCTC.

[12]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.