BAR expressolog identification: expression profile similarity ranking of homologous genes in plant species.

Large numbers of sequences are now readily available for many plant species, allowing easy identification of homologous genes. However, orthologous gene identification across multiple species is made difficult by evolutionary events such as whole-genome or segmental duplications. Several developmental atlases of gene expression have been produced in the past couple of years, and it may be possible to use these transcript abundance data to refine ortholog predictions. In this study, clusters of homologous genes between seven plant species - Arabidopsis, soybean, Medicago truncatula, poplar, barley, maize and rice - were identified. Following this, a pipeline to rank homologs within gene clusters by both sequence and expression profile similarity was devised by determining equivalent tissues between species, with the best expression profile match being termed the 'expressolog'. Five electronic fluorescent pictograph (eFP) browsers were produced as part of this effort, to aid in visualization of gene expression data and to complement existing eFP browsers at the Bio-Array Resource (BAR). Within the eFP browser framework, these expression profile similarity rankings were incorporated into an Expressolog Tree Viewer to allow cross-species homolog browsing by both sequence and expression pattern similarity. Global analyses showed that orthologs with the highest sequence similarity do not necessarily exhibit the highest expression pattern similarity. Other orthologs may show different expression patterns, indicating that such genes may require re-annotation or more specific annotation. Ultimately, it is envisaged that this pipeline will aid in improvement of the functional annotation of genes and translational plant research.

[1]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[2]  Nicholas J. Provart,et al.  An “Electronic Fluorescent Pictograph” Browser for Exploring and Analyzing Large-Scale Biological Data Sets , 2007, PloS one.

[3]  M. Gouy,et al.  Date of the monocot-dicot divergence estimated from chloroplast DNA sequence data. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Lincoln Stein,et al.  The Plant Ontology Database: a community resource for plant structure and developmental stages controlled vocabulary and annotations , 2008, Nucleic Acids Res..

[5]  S. Cutler,et al.  The irregular xylem3 locus of Arabidopsis encodes a cellulose synthase required for secondary cell wall synthesis. , 1999, The Plant cell.

[6]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[7]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[8]  Wenying Xu,et al.  Genome-Wide Gene Expression Profiling Reveals Conserved and Novel Molecular Functions of the Stigma in Rice1[W] , 2007, Plant Physiology.

[9]  G. Weiller,et al.  A gene expression atlas of the model legume Medicago truncatula. , 2008, The Plant journal : for cell and molecular biology.

[10]  W. Fitch Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.

[11]  Kiana Toufighi,et al.  The Botany Array Resource: E-northerns, Expression Angling, and Promoter Analyses , 2022 .

[12]  Ute Baumann,et al.  An atlas of gene expression from seed to seed through barley development , 2006, Functional & Integrative Genomics.

[13]  T. Sakurai,et al.  Genome sequence of the palaeopolyploid soybean , 2010, Nature.

[14]  Staffan Persson,et al.  Co-expression tools for plant biology: opportunities for hypothesis generation and caveats. , 2009, Plant, cell & environment.

[15]  Stefan R. Henz,et al.  A gene expression map of Arabidopsis thaliana development , 2005, Nature Genetics.

[16]  R. Sekhon,et al.  Genome-wide atlas of transcription during maize development. , 2011, The Plant journal : for cell and molecular biology.

[17]  B. Usadel,et al.  PlaNet: Combined Sequence and Expression Comparisons across Plant Networks Derived from Seven Species[W][OA] , 2011, Plant Cell.

[18]  Klaas Vandepoele,et al.  Comparative Network Analysis Reveals That Tissue Specificity and Gene Function Are Important Factors Influencing the Mode of Expression Evolution in Arabidopsis and Rice1[W] , 2011, Plant Physiology.

[19]  Gregory Butler,et al.  OrfPredictor: predicting protein-coding regions in EST-derived sequences , 2005, Nucleic Acids Res..

[20]  Yangrae Cho,et al.  Gene-expression profile comparisons distinguish seven organs of maize , 2002, Genome Biology.

[21]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[22]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[23]  Olivia W. Wilkins,et al.  Time of day shapes Arabidopsis drought transcriptomes. , 2010, The Plant journal : for cell and molecular biology.

[24]  Nicholas J Provart,et al.  Genotype and time of day shape the Populus drought response. , 2009, The Plant journal : for cell and molecular biology.

[25]  Claude W. dePamphilis,et al.  Ancestral polyploidy in seed plants and angiosperms , 2011, Nature.

[26]  Staffan Persson,et al.  GeneCAT—novel webtools that combine BLAST and co-expression analyses , 2008, Nucleic Acids Res..

[27]  Dr. Susumu Ohno Evolution by Gene Duplication , 1970, Springer Berlin Heidelberg.

[28]  Y. van de Peer,et al.  PLAZA: A Comparative Genomics Resource to Study Gene and Genome Evolution in Plants[W] , 2009, The Plant Cell Online.

[29]  Lincoln Stein,et al.  Gramene: a growing plant comparative genomics resource , 2007, Nucleic Acids Res..

[30]  S. Pongor,et al.  The quest for orthologs: finding the corresponding gene across genomes. , 2008, Trends in genetics : TIG.

[31]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[32]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[33]  Olga G. Troyanskaya,et al.  Accurate Quantification of Functional Analogy among Close Homologs , 2011, PLoS Comput. Biol..

[34]  Christopher J. Rawlings,et al.  An International Bioinformatics Infrastructure to Underpin the Arabidopsis Community , 2010, Plant Cell.

[35]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[36]  Mukesh Jain,et al.  F-Box Proteins in Rice. Genome-Wide Analysis, Classification, Temporal and Spatial Gene Expression during Panicle and Seed Development, and Regulation by Light and Abiotic Stress1[W][OA] , 2007, Plant Physiology.

[37]  Justin Foong,et al.  Expansion and Diversification of the Populus R2R3-MYB Family of Transcription Factors1[W][OA] , 2008, Plant Physiology.

[38]  M. Freeling,et al.  How to usefully compare homologous plant genes and chromosomes as DNA sequences. , 2008, The Plant journal : for cell and molecular biology.

[39]  G. Stacey,et al.  Complete Transcriptome of the Soybean Root Hair Cell, a Single-Cell Model, and Its Alteration in Response to Bradyrhizobium japonicum Infection1[C][W][OA] , 2009, Plant Physiology.

[40]  Robert Turgeon,et al.  The developmental dynamics of the maize leaf transcriptome , 2010, Nature Genetics.