Clustering gene expression data via mining ensembles of classification rules evolved using moses

A novel approach, model-based clustering, is described foridentifying complex interactions between genes or gene-categories based on static gene expression data. The approach deals with categorical data, which consists of a set of gene expressionprofiles belonging to one category, and a set belonging to anothercategory. An evolutionary algorithm (Meta-Optimizing Semantic Evolutionary Search, or MOSES) is used to learn an ensemble of classification models distinguishing the two categories, based on inputs that are features corresponding to gene expression values. Each feature is associated with a model-based vector, which encodes quantitative information regarding the utilization of the feature across the ensembles of models. Two different ways of constructing these vectors are explored. These model-based vectors are then clustered using a variant of hierarchical clustering called Omniclust. The result is a set of model-based clusters, in which features are gathered together if they are often considered together by classification models -- which may be because they're co-expressed, or may be for subtler reasons involving multi-gene interactions. The method is illustrated by applying it to two datasets regarding human gene expression, one drawn from brain cells and pertinent to the neurogenetics of aging, and the other drawn from blood cells and relating to differentiating between types of lymphoma. We find that, compared to traditional expression-based clustering, the new method often yields clusters that have higher mathematical quality (in the sense of homogeneity and separation) and also yield novel and meaningful insights into the underlying biological processes.

[1]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[2]  Joaquín Dopazo,et al.  Data Analysis and Visualization in Genomics and Proteomics , 2005 .

[3]  R. Shaw,et al.  Glucose metabolism and cancer. , 2006, Current opinion in cell biology.

[4]  Daniel Hanisch,et al.  New methods for joint analysis of biological networks and expression data , 2004, German Conference on Bioinformatics.

[5]  Rainer Spang,et al.  Reconstructing gene regulation networks from passive observations and active interventions , 2003 .

[6]  Meenhard Herlyn,et al.  Axis of evil: molecular mechanisms of cancer metastasis , 2003, Oncogene.

[7]  R. Sharan,et al.  CLICK: a clustering algorithm with applications to gene expression analysis. , 2000, Proceedings. International Conference on Intelligent Systems for Molecular Biology.

[8]  P. Neiman,et al.  Analysis of gene expression during myc oncogene-induced lymphomagenesis in the bursa of Fabricius , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Erik D. Demaine,et al.  K-ary Clustering with Optimal Leaf Ordering for Gene Expression Data , 2002, WABI.

[10]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[11]  I. Kohane,et al.  Gene regulation and DNA damage in the ageing human brain , 2004, Nature.

[12]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[13]  Mark P Mattson,et al.  Neuronal life-and-death signaling, apoptosis, and neurodegenerative disorders. , 2006, Antioxidants & redox signaling.

[14]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[15]  Saint Louis,et al.  Competent Program Evolution , 2006 .

[16]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[17]  Jin Hyun Park,et al.  Gene selection and classification from microarray data using kernel machine , 2004, FEBS letters.

[18]  Jean-Philippe Vert,et al.  Extracting active pathways from gene expression data , 2003, ECCB.

[19]  Faris Q Alenzi Apoptosis and diseases: regulation and clinical relevance. , 2005, Saudi medical journal.

[20]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[21]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Ben Goertzel,et al.  Identifying Complex Biological Interactions based on Categorical Gene Expression Data , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[23]  Roded Sharan,et al.  Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis , 2000, ISMB.

[24]  David Botstein,et al.  GO: : TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes , 2004, Bioinform..

[25]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[26]  Daphne Koller,et al.  Probabilistic hierarchical clustering for biological data , 2002, RECOMB '02.

[27]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[28]  Nir Friedman,et al.  Inferring quantitative models of regulatory networks from expression data , 2004, ISMB/ECCB.

[29]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[30]  G. Lombardi,et al.  Response to Comment on “Cathepsin-L Influences the Expression of Extracellular Matrix in Lymphoid Organs and Plays a Role in the Regulation of Thymic Output and of Peripheral T Cell Number” , 2006, The Journal of Immunology.

[31]  R. Sharan,et al.  Cluster analysis and its applications to gene expression data. , 2002, Ernst Schering Research Foundation workshop.

[32]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.