Comparing Time Series Transcriptome Data Between Plants Using A Network Module Finding Algorithm

Comparative transcriptome analysis is the comparison of expression patterns between homologous genes in different species. Since most molecular mechanistic studies in plants have been performed in model species including Arabidopsis and rice, comparative transcriptome analysis is particularly important for functional annotation of genes in other plant species. Many biological processes, such as embryo development, are highly conserved between different plant species. The challenge is to establish one-to-one mapping of the developmental stages between two species. In this protocol, we solve this problem by converting the gene expression patterns into a co-expression network and then apply network module-finding algorithms to the cross-species co-expression network. We describe how to perform such analysis using bash scripts for preliminary data processing and R programming language, which implemented simulated annealing method for module finding. We also provide instructions on how to visualize the resulting co-expression networks across species.

[1]  David P. Kreil,et al.  A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium , 2014, Nature Biotechnology.

[2]  W. Peacock,et al.  Control of early seed development. , 2001, Annual review of cell and developmental biology.

[3]  Y. van de Peer,et al.  i-ADHoRe 3.0—fast and sensitive detection of genomic homology in extremely large data sets , 2011, Nucleic acids research.

[4]  Matthew R. Hanlon,et al.  Araport: the Arabidopsis Information Portal , 2014, Nucleic Acids Res..

[5]  T. Vicsek,et al.  Directed network modules , 2007, physics/0703248.

[6]  E. Bornberg-Bauer,et al.  The AtGenExpress global stress expression data set: protocols, evaluation and model data analysis of UV-B light, drought and cold stress responses. , 2007, The Plant journal : for cell and molecular biology.

[7]  Alexander C. J. Roth,et al.  Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits , 2006, Nucleic acids research.

[8]  P. Benfey,et al.  High-Resolution Expression Map of the Arabidopsis Root Reveals Alternative Splicing and lincRNA Regulation. , 2016, Developmental cell.

[9]  Gaston H. Gonnet,et al.  The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements , 2014, Nucleic Acids Res..

[10]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[11]  Mark Stitt,et al.  Comparative analyses of C4 and C3 photosynthesis in developing leaves of maize and rice , 2014, Nature Biotechnology.

[12]  B. Usadel,et al.  PlaNet: Combined Sequence and Expression Comparisons across Plant Networks Derived from Seven Species[W][OA] , 2011, Plant Cell.

[13]  Alexander G. Fletcher,et al.  Ten Simple Rules for Effective Computational Research , 2014, PLoS Comput. Biol..

[14]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[15]  S. Kelly,et al.  OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy , 2015, Genome Biology.

[16]  Shoshi Kikuchi,et al.  Characterization of WRKY co-regulatory networks in rice and Arabidopsis , 2009, BMC Plant Biology.

[17]  Anton Nekrutenko,et al.  Ten Simple Rules for Reproducible Computational Research , 2013, PLoS Comput. Biol..

[18]  Christian von Mering,et al.  eggNOG: automated construction and annotation of orthologous groups of genes , 2007, Nucleic Acids Res..

[19]  Delasa Aghamirzaie,et al.  CoSpliceNet: a framework for co-splicing network inference from transcriptomics data , 2016, BMC Genomics.

[20]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[21]  Mark Gerstein,et al.  OrthoClust: an orthology-based network framework for clustering data across multiple species , 2014, Genome Biology.

[22]  N. Provart,et al.  BAR expressolog identification: expression profile similarity ranking of homologous genes in plant species. , 2012, The Plant journal : for cell and molecular biology.

[23]  David M. Goodstein,et al.  Phytozome: a comparative platform for green plant genomics , 2011, Nucleic Acids Res..

[24]  Daniel J. Gaffney,et al.  A survey of best practices for RNA-seq data analysis , 2016, Genome Biology.

[25]  David Sundell,et al.  ComPlEx: conservation and divergence of co-expression networks in A. thaliana, Populus and O. sativa , 2014, BMC Genomics.

[26]  J. Harrow,et al.  Assessment of transcript reconstruction methods for RNA-seq , 2013, Nature Methods.

[27]  Trey Ideker,et al.  Cytoscape 2.8: new features for data integration and network visualization , 2010, Bioinform..

[28]  G. Moreno-Hagelsieb,et al.  Quickly Finding Orthologs as Reciprocal Best Hits with BLAT, LAST, and UBLAST: How Much Do We Miss? , 2014, PloS one.

[29]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[30]  Nicolas Servant,et al.  A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis , 2013, Briefings Bioinform..

[31]  Haibao Tang,et al.  Finding and Comparing Syntenic Regions among Arabidopsis and the Outgroups Papaya, Poplar, and Grape: CoGe with Rosids1[W] , 2008, Plant Physiology.

[32]  Michalis Vazirgiannis,et al.  Clustering and Community Detection in Directed Networks: A Survey , 2013, ArXiv.

[33]  Y. van de Peer,et al.  Dissecting Plant Genomes with the PLAZA Comparative Genomics Platform1[W] , 2011, Plant Physiology.

[34]  Zhentian Lei,et al.  Transcript and proteomic analysis of developing white lupin (Lupinus albus L.) roots , 2009, BMC Plant Biology.

[35]  W. Frommer,et al.  50 years of Arabidopsis research: highlights and future directions. , 2016, The New phytologist.

[36]  C. Shelton,et al.  Annotating Genes of Known and Unknown Function by Large-Scale Coexpression Analysis1[W][OA] , 2008, Plant Physiology.

[37]  Wei Shi,et al.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features , 2013, Bioinform..

[38]  Michael L. Creech,et al.  Integration of biological networks and gene expression data using Cytoscape , 2007, Nature Protocols.

[39]  Delasa Aghamirzaie,et al.  Changes in RNA Splicing in Developing Soybean (Glycine max) Embryos , 2013, Biology.

[40]  Matthew R. Laird,et al.  BMC Bioinformatics BioMed Central Methodology article Improving the specificity of high-throughput ortholog prediction , 2006 .

[41]  Gary Stacey,et al.  The fate of duplicated genes in a polyploid plant genome. , 2013, The Plant journal : for cell and molecular biology.

[42]  Wei Zhao,et al.  Gramene: a resource for comparative grass genomics , 2002, Nucleic Acids Res..

[43]  Y. Xiang,et al.  Comparative genomic analysis of the WRKY III gene family in populus, grape, arabidopsis and rice , 2015, Biology Direct.

[44]  Erik L. L. Sonnhammer,et al.  Inparanoid: a comprehensive database of eukaryotic orthologs , 2004, Nucleic Acids Res..

[45]  Xuan Yao,et al.  Genome-Wide Comparative Analysis and Expression Pattern of TCP Gene Families in Arabidopsis thaliana and Oryza sativa , 2007 .

[46]  A. J. Koo,et al.  Potential targets of VIVIPAROUS1/ABI3-LIKE1 (VAL1) repression in developing Arabidopsis thaliana embryos. , 2016, The Plant journal : for cell and molecular biology.

[47]  K. Vandepoele,et al.  Comparative co-expression analysis in plant biology. , 2012, Plant, cell & environment.

[48]  Astrid Junker,et al.  An engineer's view on regulation of seed development. , 2010, Trends in plant science.

[49]  Klaas Vandepoele,et al.  Comparative Network Analysis Reveals That Tissue Specificity and Gene Function Are Important Factors Influencing the Mode of Expression Evolution in Arabidopsis and Rice1[W] , 2011, Plant Physiology.