Extracting Conserved Gene Expression Motifs from Gene Expression Data

We propose a representation for gene expression data called conserved gene expression motifs or XMOTIFs. A gene's expression level is conserved across a set of samples if the gene is expressed with the same abundance in all the samples. A conserved gene expression motif is a subset of genes that is simultaneously conserved across a subset of samples. We present a computational technique to discover large conserved gene motifs that cover all the samples and classes in the data. When applied to published data sets representing different cancers or disease outcomes, our algorithm constructs XMOTIFS that distinguish between the various classes.

[1]  Jinyan Li,et al.  Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. , 2002 .

[2]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[3]  Nir Friedman,et al.  Inferring subnetworks from perturbed expression profiles , 2001, ISMB.

[4]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[5]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[6]  T. M. Murali,et al.  A Monte Carlo algorithm for fast projective clustering , 2002, SIGMOD '02.

[7]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[8]  Roded Sharan,et al.  Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis , 2000, ISMB.

[9]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[10]  S. P. Fodor,et al.  Multiplexed biochemical assays with biological chips , 1993, Nature.

[11]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[12]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[13]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[14]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[15]  R. Sharan,et al.  CLICK: a clustering algorithm with applications to gene expression analysis. , 2000, Proceedings. International Conference on Intelligent Systems for Molecular Biology.

[16]  R. Tibshirani,et al.  Clustering methods for the analysis of DNA microarray data , 1999 .

[17]  Ash A. Alizadeh,et al.  'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns , 2000, Genome Biology.

[18]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[19]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[20]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[21]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.