Computational methods for the identification of differential and coordinated gene expression.

With the first complete 'draft' of the human genome sequence expected for Spring 2000, the three basic challenges for today's bioinformatics are more than ever: (i) finding the genes; (ii) locating their coding regions; and (iii) predicting their functions. However, our capacity for interpreting vertebrate genomic and transcript (cDNA) sequences using experimental or computational means very much lags behind our raw sequencing power. If the performances of current programs in identifying internal coding exons are good, the precise 5'-->3' delineation of transcription units (and promoters) still requires additional experiments. Similarly, functional predictions made with reference to previously characterized homologues are leaving >50% of human genes unannotated or classified in uninformative categories ('kinase', 'ATP-binding', etc.). In the context of functional genomics, large-scale gene expression studies using massive cDNA tag sequencing, two-dimensional gel proteome analysis or microarray technologies are the only approaches providing genome-scale experimental information at a pace consistent with the progress of sequencing. Given the difficulty and cost of characterizing genes one by one, academic and industrial researchers are increasingly relying on those methods to prioritize their studies and choose their targets. The study of expression patterns can also provide some insight into the function, reveal regulatory pathways, indicate side effects of drugs or serve as a diagnostic tool. In this article, I review the theoretical and computational approaches used to: (i) identify genes differentially expressed (across cell types, developmental stages, pathological conditions, etc.); (ii) identify genes expressed in a coordinated manner across a set of conditions; and (iii) delineate clusters of genes sharing coherent expression features, eventually defining global biological pathways.

[1]  Albert Francis Blakeslee,et al.  Department of Genetics , 1941 .

[2]  W. Fitch,et al.  Construction of phylogenetic trees. , 1967, Science.

[3]  D. Matula Graph Theoretic Techniques for Cluster Analysis Algorithms , 1977 .

[4]  Robert F. Ling,et al.  Classification and Clustering. , 1979 .

[5]  K. Hagino-Yamagishi,et al.  [Oncogene]. , 2019, Gan to kagaku ryoho. Cancer & chemotherapy.

[6]  J Taylor,et al.  Global approaches to quantitative analysis of gene-expression patterns observed by use of two-dimensional gel electrophoresis. , 1984, Clinical chemistry.

[7]  S. Fields,et al.  The two-hybrid system: a method to identify and clone genes for proteins that interact with a protein of interest. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Kousaku Okubo,et al.  Large scale cDNA sequencing for analysis of quantitative and qualitative aspects of gene expression , 1992, Nature Genetics.

[9]  A. Pardee,et al.  Differential display of eukaryotic messenger RNA by means of the polymerase chain reaction. , 1992, Science.

[10]  J. Craig Venter,et al.  Sequence identification of 2,375 human brain genes , 1992, Nature.

[11]  K. Okubo,et al.  Identification of new genes by systematic analysis of cDNAs and database construction. , 1993, Current opinion in biotechnology.

[12]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[13]  S. P. Fodor,et al.  Using oligonucleotide probe arrays to access genetic diversity. , 1995, BioTechniques.

[14]  R. Fleischmann,et al.  Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. , 1995, Nature.

[15]  J. Barker,et al.  Developmental kinetics of GAD family mRNAs parallel neurogenesis in the rat spinal cord , 1995, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[16]  M. Adams,et al.  Comparative expressed-sequence-tag analysis of differential gene expression profiles in PC-12 cells before and after nerve growth factor treatment. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[17]  K. Livak,et al.  Oligonucleotides with fluorescent dyes at opposite ends provide a quenched probe system useful for detecting PCR product and nucleic acid hybridization. , 1995, PCR methods and applications.

[18]  P. Brown,et al.  A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. , 1996, Genome research.

[19]  K. O. Elliston,et al.  Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data. , 1996, Genome research.

[20]  M. Schena Genome analysis with gene expression microarrays. , 1996, BioEssays : news and reviews in molecular, cellular and developmental biology.

[21]  E. Mardis,et al.  Generation and analysis of 280,000 human expressed sequence tags. , 1996, Genome research.

[22]  James L. Winkler,et al.  Accessing Genetic Information with High-Density DNA Arrays , 1996, Science.

[23]  Roland Somogyi,et al.  Modeling the complexity of genetic networks: Understanding multigenic and pleiotropic regulation , 1996, Complex..

[24]  D. Lockhart,et al.  Expression monitoring by hybridization to high-density oligonucleotide arrays , 1996, Nature Biotechnology.

[25]  P. Brown,et al.  Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[26]  J N Weinstein,et al.  Identification of epidermal growth factor receptor and c-erbB2 pathway inhibitors by correlation with gene expression patterns. , 1997, Journal of the National Cancer Institute.

[27]  J. Fickett,et al.  Eukaryotic promoter recognition. , 1997, Genome research.

[28]  K P Fung,et al.  A genome-based resource for molecular cardiovascular medicine: toward a compendium of cardiovascular genes. , 1997, Circulation.

[29]  Begley,et al.  The mouse gene expression database GXD , 1997, Seminars in cell & developmental biology.

[30]  S. Madden,et al.  SAGE transcript profiles for p53-dependent growth regulation , 1997, Oncogene.

[31]  Carol A. Dahl,et al.  New opportunities for uncovering the molecular basis of cancer , 1997, Nature Genetics.

[32]  J. Claverie Computational methods for the identification of genes in vertebrate genomic sequences. , 1997, Human molecular genetics.

[33]  L. Wodicka,et al.  Genome-wide expression monitoring in Saccharomyces cerevisiae , 1997, Nature Biotechnology.

[34]  J N Weinstein,et al.  Characterization of the p53 tumor suppressor pathway in cell lines of the National Cancer Institute anticancer drug screen and correlations with the growth-inhibitory potency of 123 anticancer agents. , 1997, Cancer research.

[35]  R H Hruban,et al.  Gene expression profiles in normal and cancer cells. , 1997, Science.

[36]  E. Lai,et al.  IMAGE cDNA clones, UniGene clustering, and ACeDB: an integrated resource for expressed sequence information. , 1997, Genome research.

[37]  J. Claverie,et al.  The significance of digital gene expression profiles. , 1997, Genome research.

[38]  Wei Zhou,et al.  Characterization of the Yeast Transcriptome , 1997, Cell.

[39]  Bard,et al.  The mouse atlas and graphical gene-expression database , 1997, Seminars in cell & developmental biology.

[40]  J. Seilhamer,et al.  A comparison of selected mRNA and protein abundances in human liver , 1997, Electrophoresis.

[41]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[42]  J N Weinstein,et al.  A protein expression database for the molecular pharmacology of cancer , 1997, Electrophoresis.

[43]  A. Pardee,et al.  Differential display , 1998, Molecular biotechnology.

[44]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[45]  G. Xiao,et al.  Identification of a breast cancer-specific gene, BCSG1, by direct differential cDNA sequencing. , 1997, Cancer research.

[46]  J. Adjaye,et al.  cDNA libraries from single human preimplantation embryos. , 1997, Genomics.

[47]  J. Inazawa,et al.  An expression profile of genes in human retina and isolation of a complementary DNA for a novel rod photoreceptor protein. , 1997, Investigative ophthalmology & visual science.

[48]  P. Brown,et al.  Yeast microarrays for genome wide parallel genetic and gene expression analysis. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[49]  J. Ross,et al.  A Test Case of Correlation Metric Construction of a Reaction Pathway from Measurements , 1997 .

[50]  G. S. Johnson,et al.  An Information-Intensive Approach to the Molecular Pharmacology of Cancer , 1997, Science.

[51]  M Schena,et al.  Microarrays: biotechnology's discovery platform for functional genomics. , 1998, Trends in biotechnology.

[52]  M. Holcombe,et al.  Information Processing in Cells and Tissues , 1998, Springer US.

[53]  J. Berg Genome sequence of the nematode C. elegans: a platform for investigating biology. , 1998, Science.

[54]  J. Barker,et al.  Large-scale temporal gene expression mapping of central nervous system development. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[55]  R Herwig,et al.  Comparative gene expression profiling by oligonucleotide fingerprinting. , 1998, Nucleic acids research.

[56]  A. Ryo,et al.  A method for analyzing the qualitative and quantitative aspects of gene expression: a transcriptional profile revealed for HeLa cells. , 1998, Nucleic acids research.

[57]  N. Heintz,et al.  Changing patterns of gene expression identify multiple steps during regression of rat prostate in vivo. , 1998, Endocrinology.

[58]  M. Bittner,et al.  Data management and analysis for gene expression arrays , 1998, Nature Genetics.

[59]  Drosophila set for fast-track sequencing , 1998, Nature.

[60]  I. Jonassen,et al.  Predicting gene regulatory elements in silico on a genomic scale. , 1998, Genome research.

[61]  G. Fu,et al.  Identification of genes expressed in human CD34(+) hematopoietic stem/progenitor cells by expressed sequence tags and efficient full-length cDNA cloning. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[62]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[63]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[64]  D Thieffry,et al.  Qualitative analysis of gene networks. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[65]  A. Watson,et al.  Technology for microarray analysis of gene expression. , 1998, Current opinion in biotechnology.

[66]  D B Davison,et al.  Alternative gene form discovery and candidate gene selection from gene indexing projects. , 1998, Genome research.

[67]  I. Pastan,et al.  Discovery of three genes specifically expressed in human prostate by expressed sequence tag database analysis. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[68]  J. Jen,et al.  Serial analysis of gene expression in non-small cell lung cancer. , 1998, Cancer research.

[69]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[70]  G S Michaels,et al.  Cluster analysis and data visualization of large-scale gene expression data. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[71]  P. D’haeseleer,et al.  Mining the gene expression matrix: inferring gene relationships from large scale gene expression data , 1998 .

[72]  Roger E Bumgarner,et al.  An expressed-sequence-tag database of the human prostate: sequence analysis of 1168 cDNA clones. , 1998, Genomics.

[73]  G C Overton,et al.  Analysis of EST-driven gene annotation in human genomic sequence. , 1998, Genome research.

[74]  S Audic,et al.  Visualizing the competitive recognition of TATA-boxes in vertebrate promoters. , 1998, Trends in genetics : TIG.

[75]  A. Sparks,et al.  Identification of c-MYC as a target of the APC pathway. , 1998, Science.

[76]  Hiroshi Kiyama,et al.  Expressed-sequence-tag approach to identify differentially expressed genes following peripheral nerve axotomy , 1998, Neuroscience Research.

[77]  S Fuhrman,et al.  Reveal, a general reverse engineering algorithm for inference of genetic network architectures. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[78]  S Audic,et al.  Alternate polyadenylation in human mRNAs: a large-scale analysis by EST clustering. , 1998, Genome research.

[79]  J. Ecker Genome sequencing: Genes blossom from a weed , 1998, Nature.

[80]  M. Eisen,et al.  Gene expression informatics —it's all in your mine , 1999, Nature Genetics.

[81]  M. Bittner,et al.  Expression profiling using cDNA microarrays , 1999, Nature Genetics.

[82]  J. Claverie,et al.  Large-scale statistical analyses of rice ESTs reveal correlated patterns of gene expression. , 1999, Genome research.

[83]  P. Goodfellow,et al.  DNA microarrays in drug discovery and development , 1999, Nature Genetics.

[84]  C. Morton,et al.  Human cochlear expressed sequence tags provide insight into cochlear gene expression and identify candidate genes for deafness. , 1999, Human molecular genetics.

[85]  Robert D. Finn,et al.  Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins , 1999, Nucleic Acids Res..

[86]  Martin Vingron,et al.  Computational aspects of expression data , 1999, Journal of Molecular Medicine.

[87]  L. Greller,et al.  Detecting selective expression of genes and proteins. , 1999, Genome research.

[88]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[89]  Shmuel Pietrokovski,et al.  New features of the Blocks Database servers , 1999, Nucleic Acids Res..

[90]  Meredith Wadman,et al.  Human Genome Project aims to finish ‘working draft’ next year , 1999, Nature.

[91]  Terri K. Attwood,et al.  PRINTS prepares for the new millennium , 1999, Nucleic Acids Res..

[92]  T. Werner Models for prediction and recognition of eukaryotic promoters , 1999, Mammalian Genome.

[93]  Todd Smith,et al.  PEDB: the Prostate Expression Database , 1999, Nucleic Acids Res..

[94]  Ron Shamir,et al.  An algorithm for clustering cDNAs for gene expression analysis , 1999, RECOMB.

[95]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[96]  D. Botstein,et al.  Exploring the new world of the genome with DNA microarrays , 1999, Nature Genetics.

[97]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[98]  Peer Bork,et al.  SMART: identification and annotation of domains from signalling and extracellular protein sequences , 1999, Nucleic Acids Res..

[99]  Amos Bairoch,et al.  The PROSITE database, its status in 1999 , 1999, Nucleic Acids Res..

[100]  V. de Waard,et al.  Serial analysis of gene expression to assess the endothelial cell response to an atherogenic stimulus. , 1999, Gene.

[101]  Jérôme Gouzy,et al.  Recent improvements of the ProDom database of protein domain families , 1999, Nucleic Acids Res..

[102]  Ka Yee Yeung,et al.  Algorithms for choosing differential gene expression experiments , 1999, RECOMB.

[103]  N. Datson,et al.  MicroSAGE: a modified procedure for serial analysis of gene expression in limited amounts of tissue. , 1999, Nucleic acids research.

[104]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[105]  D. Carr,et al.  Templates for Looking at Gene Expression Clustering , 1999 .

[106]  Brian P. Brunk,et al.  EpoDB: a prototype database for the analysis of genes expressed during vertebrate erythropoiesis , 1999, Nucleic Acids Res..

[107]  M. Morley,et al.  Making and reading microarrays , 1999, Nature Genetics.

[108]  Steven Skiena,et al.  Identifying gene regulatory networks from experimental data , 2001, Parallel Comput..