Biogeography-based informative gene selection and cancer classification using SVM and Random Forests

Microarray cancer gene expression data comprise of very high dimensions. Reducing the dimensions helps in improving the overall analysis and classification performance. We propose two hybrid techniques, Biogeography - based Optimization - Random Forests (BBO - RF) and BBO - SVM (Support Vector Machines) with gene ranking as a heuristic, for microarray gene expression analysis. This heuristic is obtained from information gain filter ranking procedure. The BBO algorithm generates a population of candidate subset of genes, as part of an ecosystem of habitats, and employs the migration and mutation processes across multiple generations of the population to improve the classification accuracy. The fitness of each gene subset is assessed by the classifiers - SVM and Random Forests. The performances of these hybrid techniques are evaluated on three cancer gene expression datasets retrieved from the Kent Ridge Biomedical datasets collection and the libSVM data repository. Our results demonstrate that genes selected by the proposed techniques yield classification accuracies comparable to previously reported algorithms.

[1]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Anthony K. H. Tung,et al.  Mining top-K covering rule groups for gene expression data , 2005, SIGMOD '05.

[4]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[7]  Javier De Las Rivas,et al.  Combining dissimilarity based classifiers for cancer prediction using gene expression profiles , 2007, BMC Bioinformatics.

[8]  Parminder Singh,et al.  Biogeography based Satellite Image Classification , 2009, ArXiv.

[9]  Constantin F. Aliferis,et al.  A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification , 2008, BMC Bioinformatics.

[10]  Hongyun Zhang,et al.  Efficient Gene Selection with Rough Sets from Gene Expression Data , 2008, RSKT.

[11]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[12]  Thomas Stützle,et al.  Guest editorial: special section on ant colony optimization , 2002, IEEE Trans. Evol. Comput..

[13]  Manuel López-Ibáñez,et al.  Ant colony optimization , 2010, GECCO '10.

[14]  Dan Simon,et al.  Biogeography-Based Optimization , 2022 .

[15]  Bhaskar D. Kulkarni,et al.  Feature Selection for Cancer Classification Using Ant Colony Optimization and Support Vector Machines , 2007, Analysis of Biological Data: A Soft Computing Approach.

[16]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[17]  Azadeh Mohammadi,et al.  Identification of disease-causing genes using microarray data mining and Gene Ontology , 2011, BMC Medical Genomics.

[18]  Russell C. Eberhart,et al.  A new optimizer using particle swarm theory , 1995, MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.

[19]  V. K. Jayaraman,et al.  Feature selection and classification employing hybrid ant colony optimization/random forest methodology. , 2009, Combinatorial chemistry & high throughput screening.

[20]  Vadlamani Ravi,et al.  Colon cancer prediction with genetic profiles using intelligent techniques , 2008, Bioinformation.

[21]  Dan Simon,et al.  Blended biogeography-based optimization for constrained optimization , 2011, Eng. Appl. Artif. Intell..

[22]  Yanchun Liang,et al.  An improved genetic algorithm with variable population-size and a PSO-GA based hybrid evolutionary algorithm , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[23]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[24]  Youping Deng,et al.  Gene selection and classification for cancer microarray data based on machine learning and similarity measures , 2011, BMC Genomics.

[25]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[26]  J. Stuart Aitken,et al.  Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes , 2005, BMC Bioinformatics.

[27]  Shameek Ghosh,et al.  Simultaneous Informative Gene Extraction and Cancer Classification Using ACO-AntMiner and ACO-Random Forests , 2012 .

[28]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[29]  Zheng Wang,et al.  Biogeography-Based Optimization for the Traveling Salesman Problems , 2010, 2010 Third International Joint Conference on Computational Science and Optimization.

[30]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[31]  David Haussler,et al.  Proceedings of the fifth annual workshop on Computational learning theory , 1992, COLT 1992.

[32]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[33]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[34]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[35]  Lifang Xu,et al.  Biogeography based optimization for Traveling Salesman Problem , 2010, 2010 Sixth International Conference on Natural Computation.

[36]  Christian Blum,et al.  Ant colony optimization: Introduction and recent trends , 2005 .