Mining gene expression databases for association rules

MOTIVATION Global gene expression profiling, both at the transcript level and at the protein level, can be a valuable tool in the understanding of genes, biological networks, and cellular states. As larger and larger gene expression data sets become available, data mining techniques can be applied to identify patterns of interest in the data. Association rules, used widely in the area of market basket analysis, can be applied to the analysis of expression data as well. Association rules can reveal biologically relevant associations between different genes or between environmental effects and gene expression. An association rule has the form LHS --> RHS, where LHS and RHS are disjoint sets of items, the RHS set being likely to occur whenever the LHS set occurs. Items in gene expression data can include genes that are highly expressed or repressed, as well as relevant facts describing the cellular environment of the genes (e.g. the diagnosis of a tumor sample from which a profile was obtained). RESULTS We demonstrate an algorithm for efficiently mining association rules from gene expression data, using the data set from Hughes et al. (2000, Cell, 102, 109-126) of 300 expression profiles for yeast. Using the algorithm, we find numerous rules in the data. A cursory analysis of some of these rules reveals numerous associations between certain genes, many of which make sense biologically, others suggesting new hypotheses that may warrant further investigation. In a data set derived from the yeast data set, but with the expression values for each transcript randomly shifted with respect to the experiments, no rules were found, indicating that most all of the rules mined from the actual data set are not likely to have occurred by chance. AVAILABILITY An implementation of the algorithm using Microsoft SQL Server with Access 2000 is available at http://dot.ped.med.umich.edu:2000/pub/assoc_rules/assoc_rules.zip. Our results from mining the yeast data set are available at http://dot.ped.med.umich.edu:2000/pub/assoc_rules/yeast_results.zip.

[1]  J. Diffley,et al.  A close relative of the nuclear, chromosomal high-mobility group protein HMG1 in yeast mitochondria. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[2]  D C Torney,et al.  Discovery of association rules in medical data , 2001, Medical informatics and the Internet in medicine.

[3]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[4]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[5]  T. Megraw,et al.  SHM1: A multicopy suppressor of a temperature‐sensitive null mutation in the HMG1‐like abf2 gene , 1996, Yeast.

[6]  J. Lechner,et al.  The Saccharomyces cerevisiae kinetochore , 1996, FEBS letters.

[7]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[8]  J. Sterling,et al.  Yeast and human genes that affect the Escherichia coli SOS response. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[10]  M. Thattai,et al.  Intrinsic noise in gene regulatory networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Nicos Maglaveras,et al.  Mining Association Rules from Clinical Databases: An Intelligent Diagnostic Process in Healthcare , 2001, MedInfo.

[12]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[13]  F Messenguy,et al.  Control‐mechanisms acting at the transcriptional and post‐transcriptional levels are involved in the synthesis of the arginine pathway carbamoylphosphate synthase of yeast. , 1983, The EMBO journal.

[14]  M. Werner-Washburne,et al.  The Highly Conserved, Coregulated SNOand SNZ Gene Families in Saccharomyces cerevisiaeRespond to Nutrient Limitation , 1998, Journal of bacteriology.

[15]  V. Contamine,et al.  Maintenance and Integrity of the Mitochondrial Genome: a Plethora of Nuclear Genes in the Budding Yeast , 2000, Microbiology and Molecular Biology Reviews.

[16]  M H Saier,et al.  Multidrug‐Resistant Transport Proteins in Yeast: Complete Inventory and Phylogenetic Characterization of Yeast Open Reading Frames within the Major Facilitator Superfamily , 1997, Yeast.

[17]  Kiyoji Nishiwaki,et al.  Structure of the yeast HIS5 gene responsive to general control of amino acid biosynthesis , 1987, Molecular and General Genetics MGG.

[18]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[19]  Y. Pekarsky,et al.  Crystal structure of the worm NitFhit Rosetta Stone protein reveals a Nit tetramer binding two Fhit dimers , 2000, Current Biology.

[20]  M. Künzler,et al.  Activation and repression of the yeast ARO3 gene by global transcription factors , 1995, Molecular microbiology.

[21]  M. De Rijcke,et al.  The ARG11 Gene of Saccharomyces cerevisiae Encodes a Mitochondrial Integral Membrane Protein Required for Arginine Biosynthesis* , 1996, The Journal of Biological Chemistry.

[22]  Y. Pekarsky,et al.  Nitrilase and Fhit homologs are encoded as fusion proteins in Drosophila melanogaster and Caenorhabditis elegans. , 1998, Proceedings of the National Academy of Sciences of the United States of America.