Comparative Analysis of DNA Microarray Data through the Use of Feature Selection Techniques

One of today’s most important scientific research topics is discovering the genetic links between cancers. This paper contains the results of a comparison of three different cancers (breast, colon, and lung) based on the results of feature selection techniques on a data set created from DNA micro array data consisting of samples from all three cancers. The data was run through a set of eighteen feature rankers which ordered the genes by importance with respect to a targeted cancer. This process was repeated three times, each time with a different target cancer. The rankings were then compared, keeping each feature ranker static while varying the cancers being compared. The cancers were evaluated both in pairs and all together, for matching genes. The results of the comparison show a large correlation between the two known hereditary cancers, breast and colon, and little correlation between lung cancer and the other cancers. This is the first study to apply eighteen different feature rankers in a bioinformatics case study, eleven of which were recently proposed and implemented by our research team.

[1]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[2]  Hichem Frigui,et al.  Ensemble Possibilistic K-NN for Functional Clustering of Gene Expression Profiles in Human Cancers Challenge , 2009, 2009 International Conference on Machine Learning and Applications.

[3]  Taghi M. Khoshgoftaar,et al.  Using regression trees to classify fault-prone software modules , 2002, IEEE Trans. Reliab..

[4]  Taghi M. Khoshgoftaar,et al.  Feature Selection with High-Dimensional Imbalanced Data , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[5]  Taghi M. Khoshgoftaar,et al.  A Study on the Relationships of Classifier Performance Metrics , 2009, 2009 21st IEEE International Conference on Tools with Artificial Intelligence.

[6]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[7]  Raúl Santos-Rodríguez,et al.  Spectral Clustering and Feature Selection for Microarray Data , 2009, 2009 International Conference on Machine Learning and Applications.

[8]  Eric C. Rouchka,et al.  Feature Selection in Cancer Classification from mRNA Data Based on Localized Dimension Reduction , 2009, 2009 International Conference on Machine Learning and Applications.

[9]  Ncbi National Center for Biotechnology Information , 2008 .

[10]  Alexander Gammerman,et al.  Application of Inductive Confidence Machine to ICMLA Competition Data , 2009, 2009 International Conference on Machine Learning and Applications.

[11]  Lloyd A. Smith,et al.  Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper , 1999, FLAIRS.

[12]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[13]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[14]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[15]  C I Amos,et al.  Is there a genetic basis for lung cancer susceptibility? , 1999, Recent results in cancer research. Fortschritte der Krebsforschung. Progres dans les recherches sur le cancer.

[16]  Tshilidzi Marwala,et al.  Differentially Expressed Gene Identification Based on Separability Index , 2009, 2009 International Conference on Machine Learning and Applications.

[17]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[18]  Xue-wen Chen,et al.  FAST: a roc-based feature selection metric for small samples and imbalanced data classification problems , 2008, KDD.

[19]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[20]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.