BAR expressolog identification: expression profile similarity ranking of homologous genes in plant species.

Large numbers of sequences are now readily available for many plant species, allowing easy identification of homologous genes. However, orthologous gene identification across multiple species is made difficult by evolutionary events such as whole-genome or segmental duplications. Several developmental atlases of gene expression have been produced in the past couple of years, and it may be possible to use these transcript abundance data to refine ortholog predictions. In this study, clusters of homologous genes between seven plant species - Arabidopsis, soybean, Medicago truncatula, poplar, barley, maize and rice - were identified. Following this, a pipeline to rank homologs within gene clusters by both sequence and expression profile similarity was devised by determining equivalent tissues between species, with the best expression profile match being termed the 'expressolog'. Five electronic fluorescent pictograph (eFP) browsers were produced as part of this effort, to aid in visualization of gene expression data and to complement existing eFP browsers at the Bio-Array Resource (BAR). Within the eFP browser framework, these expression profile similarity rankings were incorporated into an Expressolog Tree Viewer to allow cross-species homolog browsing by both sequence and expression pattern similarity. Global analyses showed that orthologs with the highest sequence similarity do not necessarily exhibit the highest expression pattern similarity. Other orthologs may show different expression patterns, indicating that such genes may require re-annotation or more specific annotation. Ultimately, it is envisaged that this pipeline will aid in improvement of the functional annotation of genes and translational plant research.

[1]  Klaas Vandepoele,et al.  Comparative Network Analysis Reveals That Tissue Specificity and Gene Function Are Important Factors Influencing the Mode of Expression Evolution in Arabidopsis and Rice1[W] , 2011, Plant Physiology.

[2]  Claude W. dePamphilis,et al.  Ancestral polyploidy in seed plants and angiosperms , 2011, Nature.

[3]  R. Sekhon,et al.  Genome-wide atlas of transcription during maize development. , 2011, The Plant journal : for cell and molecular biology.

[4]  B. Usadel,et al.  PlaNet: Combined Sequence and Expression Comparisons across Plant Networks Derived from Seven Species[W][OA] , 2011, Plant Cell.

[5]  Olga G. Troyanskaya,et al.  Accurate Quantification of Functional Analogy among Close Homologs , 2011, PLoS Comput. Biol..

[6]  Robert Turgeon,et al.  The developmental dynamics of the maize leaf transcriptome , 2010, Nature Genetics.

[7]  Olivia W. Wilkins,et al.  Time of day shapes Arabidopsis drought transcriptomes. , 2010, The Plant journal : for cell and molecular biology.

[8]  E. Birney,et al.  An International Bioinformatics Infrastructure to Underpin the Arabidopsis Community , 2010, Plant Cell.

[9]  T. Sakurai,et al.  Genome sequence of the palaeopolyploid soybean , 2010, Nature.

[10]  Y. van de Peer,et al.  PLAZA: A Comparative Genomics Resource to Study Gene and Genome Evolution in Plants[W] , 2009, The Plant Cell Online.

[11]  Staffan Persson,et al.  Co-expression tools for plant biology: opportunities for hypothesis generation and caveats. , 2009, Plant, cell & environment.

[12]  G. Stacey,et al.  Complete Transcriptome of the Soybean Root Hair Cell, a Single-Cell Model, and Its Alteration in Response to Bradyrhizobium japonicum Infection1[C][W][OA] , 2009, Plant Physiology.

[13]  Nicholas J Provart,et al.  Genotype and time of day shape the Populus drought response. , 2009, The Plant journal : for cell and molecular biology.

[14]  Justin Foong,et al.  Expansion and Diversification of the Populus R2R3-MYB Family of Transcription Factors1[W][OA] , 2008, Plant Physiology.

[15]  S. Pongor,et al.  The quest for orthologs: finding the corresponding gene across genomes. , 2008, Trends in genetics : TIG.

[16]  G. Weiller,et al.  A gene expression atlas of the model legume Medicago truncatula. , 2008, The Plant journal : for cell and molecular biology.

[17]  Staffan Persson,et al.  GeneCAT—novel webtools that combine BLAST and co-expression analyses , 2008, Nucleic Acids Res..

[18]  M. Freeling,et al.  How to usefully compare homologous plant genes and chromosomes as DNA sequences. , 2008, The Plant journal : for cell and molecular biology.

[19]  Lincoln Stein,et al.  The Plant Ontology Database: a community resource for plant structure and developmental stages controlled vocabulary and annotations , 2008, Nucleic Acids Res..

[20]  Lincoln Stein,et al.  Gramene: a growing plant comparative genomics resource , 2007, Nucleic Acids Res..

[21]  Nicholas J. Provart,et al.  An “Electronic Fluorescent Pictograph” Browser for Exploring and Analyzing Large-Scale Biological Data Sets , 2007, PloS one.

[22]  J. Vermunt,et al.  Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes , 2007, PloS one.

[23]  Mukesh Jain,et al.  F-Box Proteins in Rice. Genome-Wide Analysis, Classification, Temporal and Spatial Gene Expression during Panicle and Seed Development, and Regulation by Light and Abiotic Stress1[W][OA] , 2007, Plant Physiology.

[24]  Ute Baumann,et al.  An atlas of gene expression from seed to seed through barley development , 2006, Functional & Integrative Genomics.

[25]  Gregory Butler,et al.  OrfPredictor: predicting protein-coding regions in EST-derived sequences , 2005, Nucleic Acids Res..

[26]  Kiana Toufighi,et al.  The Botany Array Resource: e-Northerns, Expression Angling, and promoter analyses. , 2005, The Plant journal : for cell and molecular biology.

[27]  Stefan R. Henz,et al.  A gene expression map of Arabidopsis thaliana development , 2005, Nature Genetics.

[28]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[29]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[30]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[31]  Yangrae Cho,et al.  Gene-expression profile comparisons distinguish seven organs of maize , 2002, Genome Biology.

[32]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[33]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[34]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[35]  S. Cutler,et al.  The irregular xylem3 locus of Arabidopsis encodes a cellulose synthase required for secondary cell wall synthesis. , 1999, The Plant cell.

[36]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[37]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[38]  M. Gouy,et al.  Date of the monocot-dicot divergence estimated from chloroplast DNA sequence data. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Dr. Susumu Ohno Evolution by Gene Duplication , 1970, Springer Berlin Heidelberg.

[40]  W. Fitch Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.