Feature selection using genetic algorithms
暂无分享,去创建一个
Microarray data is very important for identification of complex diseases and the development of diagnostic kits. This topic exhibits considerable aid especially to cancer research. Therefore, an influential number of biological and medical researchers have to deal with the datasets obtained from microarray experiments. Usage of these huge datasets is not efficient in terms of time and cost. Thus, many researchers contribute to tumor classification via effective use of microarray technologies for cancer research. To be able to obtain the most relevant subset containing the signature genes that are included in the pathway of certain diseases and therefore capable of classifying the entire data, is very crucial for true disease diagnosis. There are several approaches in the literature for this classification purpose. In this thesis, we present an approach to use, Genetic Algorithms for this feature subset selection problem. Genetic Algorithm is combined with Support Vector Machines for the calculation of classification accuracies of each gene. These classification accuracies denote the survival probabilities of the genes in our algorithm. The genes having higher classification accuracy will have more probability to survive. Three different real life cancer datasets are used for the tests. Our algorithm converged to better results then all other approaches in the literature. In colon tumor dataset which is one of our test datasets, we were able to classify the entire data with the accuracy of 100% using only 4 features ( genes ). In prostate cancer dataset we classified the data using 3 features with the accuracy of 100%. And finally we tested our Genetic Algorithm using an ovarian cancer dataset and we found only 3 significant features out of 15154 genes, again with the accuracy of 100%