Fostering Biological Relevance in Feature Selection for Microarray Data

As a theoretical basis of mRMR feature selection, we consider a more general feature selection criterion, maximum dependency (MaxDep). 1 In this case, we select the feature set S m = {f 1 , f 2 , …, f m }, of which the joint statistical distribution is maximally dependent on the distribution of the classification variable c. A convenient way to measure this statistical dependency is mutual information, (8) where p(.) is the probabilistic density function. The MaxDep criterion aims to select features S m to maximize equation 8. Unfortunately , the multivariate density p(f 1 , …, f m) and p(f 1 , …, f m , c) are difficult to estimate accurately, developed when the number of samples is limited, the usual circumstance for many feature selection problems. However , using the standard multivariate mutual information (9) we can factorize equation 8 as I(S m ; c) = J(S m , c) ٞ J(S m). (10) Equation 10 is similar to the mRMR feature selection criterion of equation 4: The second term requires that features S m are maximally independent of each other (that is, least redundant), while the first term requires every feature to be maximally dependent on c. In other words, the two key parts of mRMR feature selection are contained in MaxDep feature selection. We've found that explicitly minimizing the redundancy term leads to dramatically better classification accuracy. For example, for the lymphoma data in figure 2a, the commonly used MaxRel features lead to 13 leave-one-out cross-validation errors (about 86 percent accuracy) in the best case. Selecting more than 30 mRMR features results in only one LOOCV error (or 99.0 percent accuracy). For the lung cancer data in figure 2b, mRMR features lead to approximately five LOOCV errors, while maxRel features lead to approximately 10 errors when more than 30 features are selected. We present more extension results elsewhere. 1,2 The performance of mRMR features is good, especially considering that the features are selected independently of any prediction methods. Extension The mRMR feature-selection method is independent of class-prediction methods. One can combine it with a particular prediction method. 2 Because mRMR features offer broad coverage of the characteristic feature space, one can first use mRMR to narrow down the search space and then apply the more expensive wrapper feature selection method at a significantly lower cost.

[1]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[2]  Christos Davatzikos,et al.  A Bayesian morphometry algorithm , 2004, IEEE Transactions on Medical Imaging.

[3]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[4]  Huan Liu,et al.  Efficiently handling feature redundancy in high-dimensional data , 2003, KDD '03.

[5]  Gregory Piatetsky-Shapiro,et al.  Microarray data mining: facing the challenges , 2003, SKDD.

[6]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[7]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Huan Liu,et al.  Redundancy based feature selection for microarray data , 2004, KDD.

[9]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[10]  Walter L. Ruzzo,et al.  Improved Gene Selection for Classification of Microarrays , 2002, Pacific Symposium on Biocomputing.