Challenges and prospects in the analysis of large-scale gene expression data

Large heterogeneous expression data comprising a variety of cellular conditions hold the promise of a global view of transcriptional regulation. While standard analysis methods have been successfully applied to smaller data sets, large-scale data pose specific challenges that have prompted the development of new and more sophisticated approaches. This paper focuses on one such approach (the Signature Algorithm) and discusses the central challenges in the analysis of large data sets, and how they might be overcome. Biological questions that have been addressed using the Signature Algorithm are highlighted and a summary of other important methods from the literature is provided.

[1]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Joshua M. Stuart,et al.  A Gene Expression Map for Caenorhabditis elegans , 2001, Science.

[3]  Susumu Goto,et al.  The KEGG databases at GenomeNet , 2002, Nucleic Acids Res..

[4]  G. Fink,et al.  Combinatorial Control Required for the Specificity of Yeast MAPK Signaling , 1997, Science.

[5]  J. Hoheisel,et al.  Correspondence analysis applied to microarray data , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[7]  H. Kitano Systems Biology: A Brief Overview , 2002, Science.

[8]  Yaniv Ziv,et al.  Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.

[9]  Michael Lappe,et al.  From gene networks to gene function. , 2003, Genome research.

[10]  Sven Bergmann,et al.  Iterative signature algorithm for the analysis of large-scale gene expression data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[12]  Graham Cameron,et al.  One-stop shop for microarray data , 2000, Nature.

[13]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Lani F. Wu,et al.  Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters , 2002, Nature Genetics.

[15]  G. Church,et al.  Identifying regulatory networks by combinatorial analysis of promoter elements , 2001, Nature Genetics.

[16]  E. Davidson,et al.  Genomic cis-regulatory logic: experimental and computational analysis of a sea urchin gene. , 1998, Science.

[17]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[18]  P. Kemmeren,et al.  Protein interaction verification and functional annotation by integrated analysis of genome-scale data. , 2002, Molecular cell.

[19]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[20]  Ben Taskar,et al.  Rich probabilistic models for gene expression , 2001, ISMB.

[21]  G. Church,et al.  Genome-wide co-occurrence of promoter elements reveals a cis-regulatory cassette of rRNA transcription motifs in Saccharomyces cerevisiae. , 2002, Genome research.

[22]  Sven Bergmann,et al.  Defining transcription modules using large-scale gene expression data , 2004, Bioinform..

[23]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[24]  S. Bergmann,et al.  Similarities and Differences in Genome-Wide Expression Data of Six Organisms , 2003, PLoS biology.

[25]  Neal S. Holter,et al.  Fundamental patterns underlying gene expression profiles: simplicity from complexity. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[26]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[27]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[28]  Jan Ihmels,et al.  Principles of transcriptional control in the metabolic network of Saccharomyces cerevisiae , 2004, Nature Biotechnology.

[29]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.

[30]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Bart De Moor,et al.  Biclustering microarray data by Gibbs sampling , 2003, ECCB.

[32]  B. Palsson,et al.  In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data , 2001, Nature Biotechnology.

[33]  Blatt,et al.  Superparamagnetic clustering of data. , 1998, Physical review letters.

[34]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[35]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[36]  D. Slonim From patterns to pathways: gene expression data analysis comes of age , 2002, Nature Genetics.

[37]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2005, Nucleic Acids Res..

[38]  David Botstein,et al.  The Stanford Microarray Database , 2001, Nucleic Acids Res..

[39]  A. Owen,et al.  A gene recommender algorithm to identify coexpressed genes in C. elegans. , 2003, Genome research.

[40]  Nicola J. Rinaldi,et al.  Computational discovery of gene modules and regulatory networks , 2003, Nature Biotechnology.

[41]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[42]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[43]  T. Werner The promoter connection , 2001, Nature Genetics.

[44]  David Botstein,et al.  A systematic approach to reconstructing transcription networks in Saccharomyces cerevisiae , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[45]  A. Brazma,et al.  Gene expression data analysis , 2000, FEBS letters.

[46]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[47]  Eytan Domany,et al.  Coupled Two-way Clustering Analysis of Breast Cancer and Colon Cancer Gene Expression Data , 2002, Bioinform..

[48]  A. Barabasi,et al.  The topology of the transcription regulatory network in the yeast , 2002, cond-mat/0205181.

[49]  E. Lander Array of hope , 1999, Nature Genetics.

[50]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[51]  Roded Sharan,et al.  Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[52]  Eytan Domany,et al.  Coupled Two-way Clustering Server , 2003, Bioinform..

[53]  Daniel Segrè,et al.  The regulatory software of cellular metabolism. , 2004, Trends in biotechnology.

[54]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[55]  Daphne Koller,et al.  Genome-wide discovery of transcriptional modules from DNA sequence and gene expression , 2003, ISMB.

[56]  Ash A. Alizadeh,et al.  'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns , 2000, Genome Biology.

[57]  Ned S. Wingreen,et al.  Finding regulatory modules through large-scale gene-expression data analysis , 2003, Bioinform..

[58]  Haidong Wang,et al.  Discovering molecular pathways from protein interaction and gene expression data , 2003, ISMB.

[59]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[60]  M. Eisen,et al.  Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering , 2002, Genome Biology.

[61]  Jason E. Stewart,et al.  Design and implementation of microarray gene expression markup language (MAGE-ML) , 2002, Genome Biology.