An Agent-Based Clustering Approach for Gene Selection in Gene Expression Microarray

Gene selection is a major research area in microarray analysis, which seeks to discover differentially expressed genes for a particular target annotation. Such genes also often called informative genes are able to differentiate tissue samples belonging to different classes of the studied disease. Despite the fact that there is a wide number of proposals, the complexity imposed by this problem remains a challenge today. This research proposes a gene selection approach by means of a clustering-based multi-agent system. This proposal manages different filter methods and gene clustering through coordinated agents to discover informative gene subsets. To assess the reliability of our approach, we have used four important and public gene expression datasets, two Lung cancer datasets, Colon and Leukemia cancer dataset. The achieved results have been validated through cluster validity measures, visual analytics, a classifier and compared with other gene selection methods, proving the reliability of our proposal.

[1]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[2]  Udaya B. Kogalur,et al.  spikeslab: Prediction and Variable Selection Using Spike and Slab Regression , 2010, R J..

[3]  Kun-Huang Chen,et al.  A hybrid classifier combining Borderline-SMOTE with AIRS algorithm for estimating brain metastasis from lung cancer: A case study in Taiwan , 2015, Comput. Methods Programs Biomed..

[4]  I. Jolliffe Principal Component Analysis , 2002 .

[5]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[6]  Yuren Zhou,et al.  A Runtime Analysis of Evolutionary Algorithms for Constrained Optimization Problems , 2007, IEEE Transactions on Evolutionary Computation.

[7]  Jianzhong Li,et al.  A stable gene selection in microarray data analysis , 2006, BMC Bioinformatics.

[8]  Peter A. Flach,et al.  Machine Learning - The Art and Science of Algorithms that Make Sense of Data , 2012 .

[9]  S. Rothschild [Advanced and Metastatic Lung Cancer – What is new in the Diagnosis and Therapy?]. , 2015, Praxis.

[10]  Eric Durand,et al.  POPS: A Software for Prediction of Population Genetic Structure Using Latent Regression Models , 2015 .

[11]  S. Rothschild Das fortgeschrittene Bronchialkarzinom – was gibt es Neues in der Diagnostik und Therapie? , 2015 .

[12]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[13]  J. S. Rao,et al.  Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[14]  Robert Tibshirani,et al.  Hybrid hierarchical clustering with applications to microarray data. , 2005, Biostatistics.

[15]  Xiaoxing Liu,et al.  An Entropy-based gene selection method for cancer classification using microarray data , 2005, BMC Bioinformatics.

[16]  Holger Sültmann,et al.  Global gene expression analysis reveals specific patterns of cell junctions in non-small cell lung cancer subtypes. , 2009, Lung cancer.

[17]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[18]  Fernando Díaz,et al.  An evolutionary computational model applied to cluster analysis of DNA microarray data , 2013, Expert Syst. Appl..

[19]  A. Schwartz,et al.  The molecular epidemiology of lung cancer. , 2006, Carcinogenesis.

[20]  Mark A. Wolters A Genetic Algorithm for Selection of Fixed-Size Subsets with Application to Design Problems , 2015 .

[21]  Paulo Novais,et al.  A visual analytics framework for cluster analysis of DNA microarray data , 2013, Expert Syst. Appl..

[22]  Graziano Pesole,et al.  Regularized Least Squares Cancer Classifiers from DNA microarray data , 2005, BMC Bioinformatics.

[23]  Jean-Philippe Vert,et al.  The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures , 2011, PloS one.

[24]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[25]  José Antonio Castellanos Garzón,et al.  A Gene Selection Approach based on Clustering for Classification Tasks in Colon Cancer , 2016 .

[26]  C. Zappa,et al.  Non-small cell lung cancer: current treatment and future advances. , 2016, Translational lung cancer research.

[27]  Tripti Swarnkar,et al.  Filter versus Wrapper Feature Subset Selection in Large Dimensionality Micro array : A Review , 2011 .

[28]  José Crispín Hernández Hernández,et al.  A Genetic Embedded Approach for Gene Selection and Classification of Microarray Data , 2007, EvoBIO.

[29]  J. Perea,et al.  Bases moleculares del cáncer colorrectal: ¿Hacia un manejo individualizado? , 2011 .

[30]  Saeid Nahavandi,et al.  Hierarchical Gene Selection and Genetic Fuzzy System for Cancer Microarray Data Classification , 2015, PloS one.

[31]  Liang Yang,et al.  Computational promoter analysis of mouse, rat and human antimicrobial peptide-coding genes , 2006, BMC Bioinformatics.

[32]  Andrew Harrison,et al.  A feature selection method for classification within functional genomics experiments based on the proportional overlapping score , 2014, BMC Bioinformatics.

[33]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[34]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[35]  M. Hidalgo,et al.  Molecular basis of colorrectal cancer: towards an individualized management? , 2011, Revista espanola de enfermedades digestivas : organo oficial de la Sociedad Espanola de Patologia Digestiva.

[36]  Jung Eun Lee,et al.  Sex- and gender-specific disparities in colorectal cancer risk. , 2015, World journal of gastroenterology.

[37]  Werner Dubitzky,et al.  A Practical Approach to Microarray Data Analysis , 2003, Springer US.

[38]  Hongzhe Li,et al.  Gradient directed regularization for sparse Gaussian concentration graphs, with applications to inference of genetic networks. , 2006, Biostatistics.

[39]  M. Bertagnolli,et al.  Molecular origins of cancer: Molecular basis of colorectal cancer. , 2009, The New England journal of medicine.

[40]  K. Strimmer,et al.  Feature selection in omics prediction problems using cat scores and false nondiscovery rate control , 2009, 0903.2003.

[41]  W. Jiang,et al.  Cancer Invasion and Metastasis: Molecular and Cellular Perspective , 2013 .

[42]  Ron Leder,et al.  Identification of Relevant Genes with a Multi-Agent System using Gene Expression Data , 2011 .

[43]  Mohd Saberi Mohamad,et al.  Random forest for gene selection and microarray data classification , 2011, Bioinformation.

[44]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[45]  Hugues Bersini,et al.  A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[46]  Walter L. Ruzzo,et al.  Improved Gene Selection for Classification of Microarrays , 2002, Pacific Symposium on Biocomputing.

[47]  Witold R. Rudnicki,et al.  Feature Selection with the Boruta Package , 2010 .

[48]  P. Weiss Applications of Generating Functions in Nonparametric Tests , 2005 .

[49]  Yang Wang,et al.  Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data , 2005, IEEE ACM Trans. Comput. Biol. Bioinform..

[50]  Olivier Sallou,et al.  Community-driven development for computational biology at Sprints, Hackathons and Codefests , 2014, BMC Bioinformatics.

[51]  Pedro Larrañaga,et al.  Filter versus wrapper gene selection approaches in DNA microarray domains , 2004, Artif. Intell. Medicine.

[52]  A. H. Salleh,et al.  Gene knockout identification for metabolite production improvement using a hybrid of genetic ant colony optimization and flux balance analysis , 2015, Biotechnology and Bioprocess Engineering.

[53]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[54]  Beng Chin Ooi,et al.  BORDER: efficient computation of boundary points , 2006, IEEE Transactions on Knowledge and Data Engineering.

[55]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[56]  Daniel G. Baden,et al.  Brevenal Inhibits Pacific Ciguatoxin-1B-Induced Neurosecretion from Bovine Chromaffin Cells , 2008, PloS one.

[57]  Juan M. Corchado,et al.  Obtaining Relevant Genes by Analysis of Expression Arrays with a Multi-agent System , 2015, PACBB.

[58]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..