Framework for Knowledge-Based Integrative Analysis of Microarray Data

The whole genome DNA microarrays make it possible to monitor the expression of nearly all the genes in an organism and have been widely used in scientific and industrial fields. The challenges no longer lie in obtaining the data, but rather in interpreting the results to reveal the mechanisms of biological significance. A recent established method GSEA [1] assesses whether priori defined gene sets shows statistically significant, concordant differences between two biological states. This knowledge-based modular level analysis method proved to be superior to traditional single gene-based method [2], which is also demonstrated by several improvements base on the concept of GSEA. However, GSEA was designed to work on a ranked list of genes [3], so knowledge-based analysis of other data types remains a challenge. In this study, we have proposed a framework for gene set analysis of three major data types, ranked genes, clustered genes and signature genes. More interestingly, we further extended these methods to de novo motif discovery in almost the same framework. Analysis on real microarray data showed that results of biological significance could be recovered. The R scripts for Knowledge-based Integrative Analysis of Microarray data (KIAM) are available upon request from the authors.

[1]  A. Butte,et al.  Coordinated reduction of genes of oxidative metabolism in humans with insulin resistance and diabetes: Potential role of PGC1 and NRF1 , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Eliot Marshall,et al.  Getting the Noise Out of Gene Arrays , 2004, Science.

[3]  Xiaohui S. Xie,et al.  Errα and Gabpa/b specify PGC-1α-dependent oxidative phosphorylation gene expression that is altered in diabetic muscle , 2004 .

[4]  Daniel G Tenen,et al.  ATRA resolves the differentiation block in t(15;17) acute myeloid leukemia by restoring PU.1 expression. , 2004, Blood.

[5]  Jill P. Mesirov,et al.  GSEA-P: a desktop application for Gene Set Enrichment Analysis , 2007, Bioinform..

[6]  M Ramalho Santos STEMNESS: TRANSCRIPTIONAL PROFILING OF EMBRYONIC AND ADULT STEM CELLS , 2002 .

[7]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Olivier Elemento,et al.  Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approach , 2005, Genome Biology.

[9]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[10]  K. Petersen,et al.  Impaired mitochondrial activity in the insulin-resistant offspring of patients with type 2 diabetes. , 2004, The New England journal of medicine.

[11]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Jaeyoung Kim,et al.  Identifying Biologically Significant Pathways by Gene Set Enrichment Analysis Using Fisher's Criterion , 2008, 2008 Second International Conference on Future Generation Communication and Networking.

[13]  T. Barrette,et al.  ONCOMINE: a cancer microarray database and integrated data-mining platform. , 2004, Neoplasia.

[14]  Seon-Young Kim,et al.  PAGE: Parametric Analysis of Gene Set Enrichment , 2005, BMC Bioinform..

[15]  T. Barrette,et al.  Mining for regulatory programs in the cancer transcriptome , 2005, Nature Genetics.

[16]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[17]  C. Ball,et al.  Identification of genes periodically expressed in the human cell cycle and their expression in tumors. , 2002, Molecular biology of the cell.

[18]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[19]  Ji Zhang,et al.  Component plane presentation integrated self‐organizing map for microarray data analysis , 2003, FEBS letters.

[20]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[21]  N. Slonim,et al.  A universal framework for regulatory element discovery across all genomes and data types. , 2007, Molecular cell.

[22]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[23]  Todd R Golub,et al.  Gene expression–based high-throughput screening(GE-HTS) and application to leukemia differentiation , 2004, Nature Genetics.

[24]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[25]  T. Golub,et al.  Transformation from committed progenitor to leukaemia stem cell initiated by MLL–AF9 , 2006, Nature.

[26]  T. Ley,et al.  Commonly Dysregulated Genes in Murine Apl Cells , 2006 .

[27]  Xiaohui Xie,et al.  Erralpha and Gabpa/b specify PGC-1alpha-dependent oxidative phosphorylation gene expression that is altered in diabetic muscle. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[28]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.