An evolutionary approach for gene expression patterns

This study presents an evolutionary algorithm, called a heterogeneous selection genetic algorithm (HeSGA), for analyzing the patterns of gene expression on microarray data. Microarray technologies have provided the means to monitor the expression levels of a large number of genes simultaneously. Gene clustering and gene ordering are important in analyzing a large body of microarray expression data. The proposed method simultaneously solves gene clustering and gene-ordering problems by integrating global and local search mechanisms. Clustering and ordering information is used to identify functionally related genes and to infer genetic networks from immense microarray expression data. HeSGA was tested on eight test microarray datasets, ranging in size from 147 to 6221 genes. The experimental clustering and visual results indicate that HeSGA not only ordered genes smoothly but also grouped genes with similar gene expressions. Visualized results and a new scoring function that references predefined functional categories were employed to confirm the biological interpretations of results yielded using HeSGA and other methods. These results indicate that HeSGA has potential in analyzing gene expression patterns.

[1]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[2]  C. Müller,et al.  Large-scale clustering of cDNA-fingerprinting data. , 1999, Genome research.

[3]  K. Dejong,et al.  An analysis of the behavior of a class of genetic adaptive systems , 1975 .

[4]  J. Barker,et al.  Large-scale temporal gene expression mapping of central nervous system development. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Cheng-Yan Kao,et al.  A family competition evolutionary algorithm for automated docking of flexible ligands to proteins , 2000, IEEE Transactions on Information Technology in Biomedicine.

[6]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[7]  Gaston H. Gonnet,et al.  Using traveling salesman problem algorithms for evolutionary tree construction , 2000, Bioinform..

[8]  Jim Smith,et al.  A Memetic Algorithm With Self-Adaptive Local Search: TSP as a case study , 2000, GECCO.

[9]  Robert J. Schaffer,et al.  Microarray Analysis of Diurnal and Circadian-Regulated Genes in Arabidopsis , 2001, The Plant Cell.

[10]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[11]  M. Jackson,et al.  Gene expression profiles of laser-captured adjacent neuronal subtypes , 1999, Nature Medicine.

[12]  M. Dorigo,et al.  Aco Algorithms for the Traveling Salesman Problem , 1999 .

[13]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[14]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[15]  D. Feng,et al.  IEEE transactions on information technology in biomedicine: special issue on advances in clinical and health-care knowledge management , 2005 .

[16]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[17]  B. Freisleben,et al.  Genetic local search for the TSP: new results , 1997, Proceedings of 1997 IEEE International Conference on Evolutionary Computation (ICEC '97).

[18]  Cheng-Yan Kao,et al.  Optical Coating Designs Using the Family Competition Evolutionary Algorithm , 2001, Evolutionary Computation.

[19]  S. P. Fodor,et al.  Determination of ancestral alleles for human single-nucleotide polymorphisms using high-density oligonucleotide arrays , 1999, Nature Genetics.

[20]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[21]  Hong Wang,et al.  Gene Expression Profiles during the Initial Phase of Salt Stress in Rice , 2001, Plant Cell.

[22]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[23]  R. Sharan,et al.  CLICK: a clustering algorithm with applications to gene expression analysis. , 2000, Proceedings. International Conference on Intelligent Systems for Molecular Biology.

[24]  Nir Friedman,et al.  Context-specific Bayesian clustering for gene expression data , 2001, J. Comput. Biol..

[25]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[26]  Ellen M. Voorhees,et al.  Implementing agglomerative hierarchic clustering algorithms for use in document retrieval , 1986, Inf. Process. Manag..

[27]  B. Morgan,et al.  Non-uniqueness and Inversions in Cluster Analysis , 1995 .

[28]  Thomas Bäck,et al.  Evolutionary Algorithms in Theory and Practice , 1996 .

[29]  Alfonso Valencia,et al.  A hierarchical unsupervised growing neural network for clustering gene expression patterns , 2001, Bioinform..

[30]  D. Botstein,et al.  DNA microarray analysis of gene expression in response to physiological and genetic changes that affect tryptophan metabolism in Escherichia coli. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[31]  David B. Fogel,et al.  Evolutionary algorithms in theory and practice , 1997, Complex.

[32]  Cheng-Yan Kao,et al.  A Genetic Algorithm with Adaptive Mutations and Family Competition for Training Neural Networks , 2000, Int. J. Neural Syst..

[33]  Roded Sharan,et al.  Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis , 2000, ISMB.

[34]  William J. Cook,et al.  Finding Tours in the TSP , 1999 .

[35]  L. Darrell Whitley,et al.  The Traveling Salesrep Problem, Edge Assembly Crossover, and 2-opt , 1998, PPSN.

[36]  C. Nusbaum,et al.  Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. , 1998, Science.

[37]  Cheng-Yan Kao,et al.  A genetic algorithm for traveling salesman problems , 2001 .

[38]  I. Jonassen,et al.  Predicting gene regulatory elements in silico on a genomic scale. , 1998, Genome research.

[39]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Robin Sibson,et al.  SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method , 1973, Comput. J..

[41]  Kenneth Alan De Jong,et al.  An analysis of the behavior of a class of genetic adaptive systems. , 1975 .

[42]  Gerhard Reinelt,et al.  TSPLIB - A Traveling Salesman Problem Library , 1991, INFORMS J. Comput..

[43]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[44]  James L. Winkler,et al.  Accessing Genetic Information with High-Density DNA Arrays , 1996, Science.

[45]  D. E. Goldberg,et al.  Genetic Algorithms in Search, Optimization & Machine Learning , 1989 .

[46]  D. Defays,et al.  An Efficient Algorithm for a Complete Link Method , 1977, Comput. J..

[47]  Stephen F. Smith,et al.  The GENIE is out! (Who needs fitness to evolve?) , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[48]  S. P. Fodor,et al.  Using oligonucleotide probe arrays to access genetic diversity. , 1995, BioTechniques.

[49]  Thomas Stützle,et al.  ACO Algorithms for the Travelling Salesman Problem , 1999 .

[50]  Patrik D'haeseleer,et al.  Genetic network inference: from co-expression clustering to reverse engineering , 2000, Bioinform..

[51]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[52]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Erik D. Demaine,et al.  Optimal Arrangement of Leaves in the Tree Representing Hierarchical Clustering of Gene Expression Data , 2001 .

[54]  Tommi S. Jaakkola,et al.  Fast optimal leaf ordering for hierarchical clustering , 2001, ISMB.

[55]  Martin Zachariasen,et al.  Tabu Search on the Geometric Traveling Salesman Problem , 1996 .

[56]  Shigenobu Kobayashi,et al.  Edge Assembly Crossover: A High-Power Genetic Algorithm for the Travelling Salesman Problem , 1997, ICGA.

[57]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.