An Ant Colony Optimization Based Dimension Reduction Method for High-Dimensional Datasets

In this paper, a bionic optimization algorithm based dimension reduction method named Ant Colony Optimization -Selection (ACO-S) is proposed for high-dimensional datasets. Because microarray datasets comprise tens of thousands of features (genes), they are usually used to test the dimension reduction techniques. ACO-S consists of two stages in which two well-known ACO algorithms, namely ant system and ant colony system, are utilized to seek for genes, respectively. In the first stage, a modified ant system is used to filter the nonsignificant genes from high-dimensional space, and a number of promising genes are reserved in the next step. In the second stage, an improved ant colony system is applied to gene selection. In order to enhance the search ability of ACOs, we propose a method for calculating priori available heuristic information and design a fuzzy logic controller to dynamically adjust the number of ants in ant colony system. Furthermore, we devise another fuzzy logic controller to tune the parameter (q0) in ant colony system. We evaluate the performance of ACO-S on five microarray datasets, which have dimensions varying from 7129 to 12000. We also compare the performance of ACO-S with the results obtained from four existing well-known bionic optimization algorithms. The comparison results show that ACO-S has a notable ability to generate a gene subset with the smallest size and salient features while yielding high classification accuracy. The comparative results generated by ACO-S adopting different classifiers are also given. The proposed method is shown to be a promising and effective tool for mining high-dimension data and mobile robot navigation.

[1]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  Jin Hwan Do,et al.  Clustering approaches to identifying gene expression patterns from DNA microarray data. , 2008, Molecules and cells.

[3]  Elena Marchiori,et al.  Feature selection in proteomic pattern data with support vector machines , 2004, 2004 Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[4]  Renfa Li,et al.  A Novel Hybrid Method for Gene Selection of Microarray Data , 2011 .

[5]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[6]  Marc Gravel,et al.  Comparing an ACO algorithm with other heuristics for the single machine scheduling problem with sequence-dependent setup times , 2002, J. Oper. Res. Soc..

[7]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[8]  Xuegong Zhang,et al.  Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data , 2006, BMC Bioinformatics.

[9]  Hartmut Schmeck,et al.  Ant colony optimization for resource-constrained project scheduling , 2000, IEEE Trans. Evol. Comput..

[10]  Zuren Feng,et al.  An efficient ant colony optimization approach to attribute reduction in rough set theory , 2008, Pattern Recognit. Lett..

[11]  Marco Dorigo,et al.  Ant colony optimization theory: A survey , 2005, Theor. Comput. Sci..

[12]  Vincent S. Tseng,et al.  Discovering relational-based association rules with multiple minimum supports on microarray datasets , 2011, Bioinform..

[13]  Sinisa Todorovic,et al.  Local-Learning-Based Feature Selection for High-Dimensional Data Analysis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[15]  Juanying Xie,et al.  Using support vector machines with a novel hybrid feature selection method for diagnosis of erythemato-squamous diseases , 2011, Expert Syst. Appl..

[16]  Luca Maria Gambardella,et al.  Ant Algorithms for Discrete Optimization , 1999, Artificial Life.

[17]  Chih-Ming Chen,et al.  An efficient fuzzy classifier with feature selection based on fuzzy entropy , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[18]  Silvia Casado Yusta,et al.  Different metaheuristic strategies to solve the feature selection problem , 2009, Pattern Recognit. Lett..

[19]  Xiangtao Li,et al.  An opposition-based differential evolution algorithm for permutation flow shop scheduling based on diversity measure , 2013, Adv. Eng. Softw..

[20]  T. Golub,et al.  Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. , 2003, Cancer research.

[21]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[22]  Pierre Geurts,et al.  Proteomic mass spectra classification using decision tree based ensemble methods , 2005, Bioinform..

[23]  Han Hoogeveen,et al.  Short Shop Schedules , 1997, Oper. Res..

[24]  Hao Dong,et al.  An improved particle swarm optimization for feature selection , 2011 .

[25]  David Ward,et al.  Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data , 2003, Bioinform..

[26]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[28]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[29]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[30]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[31]  V. K. Jayaraman,et al.  Feature selection and classification employing hybrid ant colony optimization/random forest methodology. , 2009, Combinatorial chemistry & high throughput screening.

[32]  Masoud Rabbani,et al.  A multi-objective particle swarm optimization for project selection problem , 2010, Expert Syst. Appl..

[33]  Jing Zhao,et al.  A Modified Ant Colony Optimization Algorithm for Tumor Marker Gene Selection , 2009, Genom. Proteom. Bioinform..

[34]  Thomas Stützle,et al.  MAX-MIN Ant System , 2000, Future Gener. Comput. Syst..

[35]  Gang Wang,et al.  A novel bankruptcy prediction model based on an adaptive fuzzy k-nearest neighbor method , 2011, Knowl. Based Syst..

[36]  Chuen-Chien Lee FUZZY LOGIC CONTROL SYSTEMS: FUZZY LOGIC CONTROLLER - PART I , 1990 .

[37]  Luca Maria Gambardella,et al.  Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[38]  Lei Xi,et al.  A novel ensemble algorithm for biomedical classification based on Ant Colony Optimization , 2011, Appl. Soft Comput..

[39]  Melanie Hilario,et al.  Mining mass spectra for diagnosis and biomarker discovery of cerebral accidents , 2004, Proteomics.

[40]  Gang Wang,et al.  A new hybrid method based on local fisher discriminant analysis and support vector machines for hepatitis disease diagnosis , 2011, Expert Syst. Appl..

[41]  Xiang-tao Li,et al.  Prediction of Lysine Ubiquitylation with Ensemble Classifier and Feature Selection , 2011, International journal of molecular sciences.

[42]  M. Shaw,et al.  Induction of fuzzy decision trees , 1995 .

[43]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[44]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[45]  B. Bullnheimer,et al.  A NEW RANK BASED VERSION OF THE ANT SYSTEM: A COMPUTATIONAL STUDY , 1997 .

[46]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[47]  J. K. Bertrand,et al.  The ant colony algorithm for feature selection in high-dimension gene expression data for disease classification. , 2007, Mathematical medicine and biology : a journal of the IMA.

[48]  Marco Dorigo,et al.  Ant system: optimization by a colony of cooperating agents , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[49]  Alice E. Smith,et al.  An ant colony optimization algorithm for the redundancy allocation problem (RAP) , 2004, IEEE Transactions on Reliability.

[50]  Feng-Chia Li,et al.  Combination of feature selection approaches with SVM in credit scoring , 2010, Expert Syst. Appl..

[51]  G. Li,et al.  An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers , 2002, Bioinform..

[52]  Chuen-Chien Lee,et al.  Fuzzy logic in control systems: fuzzy logic controller. II , 1990, IEEE Trans. Syst. Man Cybern..

[53]  Said Salhi,et al.  A multi-level composite heuristic for the multi-depot vehicle fleet mix problem , 1997 .

[54]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.