Step-by-Step Construction of Gene Co-expression Networks from High-Throughput Arabidopsis RNA Sequencing Data.

The rapid increase in the availability of transcriptomics data generated by RNA sequencing represents both a challenge and an opportunity for biologists without bioinformatics training. The challenge is handling, integrating, and interpreting these data sets. The opportunity is to use this information to generate testable hypothesis to understand molecular mechanisms controlling gene expression and biological processes (Fig. 1). A successful strategy to generate tractable hypotheses from transcriptomics data has been to build undirected network graphs based on patterns of gene co-expression. Many examples of new hypothesis derived from network analyses can be found in the literature, spanning different organisms including plants and specific fields such as root developmental biology.In order to make the process of constructing a gene co-expression network more accessible to biologists, here we provide step-by-step instructions using published RNA-seq experimental data obtained from a public database. Similar strategies have been used in previous studies to advance root developmental biology. This guide includes basic instructions for the operation of widely used open source platforms such as Bio-Linux, R, and Cytoscape. Even though the data we used in this example was obtained from Arabidopsis thaliana, the workflow developed in this guide can be easily adapted to work with RNA-seq data from any organism.

[1]  Yann LeCun,et al.  Predictive network modeling of the high-resolution dynamic plant transcriptome in response to nitrate , 2010, Genome Biology.

[2]  A. Weber Discovering new biology through RNA-Seq , 2015 .

[3]  R. Gutiérrez,et al.  Members of BTB Gene Family of Scaffold Proteins Suppress Nitrate Uptake and Nitrogen Use Efficiency1 , 2016, Plant Physiology.

[4]  Christopher S. Poultney,et al.  Insights into the genomic nitrate response using genetics and the Sungear Software System. , 2007, Journal of experimental botany.

[5]  Dawn Field,et al.  Open software for biologists: from famine to feast , 2006, Nature Biotechnology.

[6]  Elise A. R. Serin,et al.  Learning from Co-expression Networks: Possibilities and Challenges , 2016, Front. Plant Sci..

[7]  Joanna M. Cross,et al.  Genetic Approaches to Study Plant Responses to Environmental Stresses: An Overview , 2016, Biology.

[8]  Pornpimol Charoentong,et al.  ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks , 2009, Bioinform..

[9]  Kengo Kinoshita,et al.  ATTED-II in 2016: A Plant Coexpression Database Towards Lineage-Specific Coexpression , 2015, Plant & cell physiology.

[10]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[11]  Gary D. Bader,et al.  GeneMANIA Prediction Server 2013 Update , 2013, Nucleic Acids Res..

[12]  Gabriel Krouk,et al.  A Systems View of Responses to Nutritional Cues in Arabidopsis: Toward a Paradigm Shift for Predictive Network Modeling1 , 2009, Plant Physiology.

[13]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[14]  Rodrigo A Gutiérrez,et al.  Systems approach identifies an organic nitrogen-responsive gene network that is regulated by the master clock control gene CCA1 , 2008, Proceedings of the National Academy of Sciences.

[15]  Rodrigo A. Gutiérrez,et al.  Systems analysis of transcriptome data provides new hypotheses about Arabidopsis root response to nitrate treatments , 2014, Front. Plant Sci..

[16]  Felipe F. Aceituno,et al.  Systems approach identifies TGA1 and TGA4 transcription factors as important regulatory components of the nitrate response of Arabidopsis thaliana roots. , 2014, The Plant journal : for cell and molecular biology.

[17]  B. Wilhelm,et al.  RNA-Seq-quantitative measurement of expression through massively parallel RNA-sequencing. , 2009, Methods.

[18]  George W Bassel,et al.  Systems Analysis of Plant Functional, Transcriptional, Physical Interaction, and Metabolic Networks , 2012, Plant Cell.

[19]  Rasko Leinonen,et al.  The sequence read archive: explosive growth of sequencing data , 2011, Nucleic Acids Res..

[20]  Allan Kuchinsky,et al.  GLay: community structure analysis of biological networks , 2010, Bioinform..

[21]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[22]  Z. Fei,et al.  Catalyzing plant science research with RNA-seq , 2013, Front. Plant Sci..

[23]  G. Barton,et al.  How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? , 2015, RNA.

[24]  Jian-Kang Zhu,et al.  Rapid phosphatidic acid accumulation in response to low temperature stress in Arabidopsis is generated through diacylglycerol kinase , 2013, Front. Plant Sci..

[25]  Dirk Inzé,et al.  CORNET: A User-Friendly Tool for Data Mining and Integration1[W] , 2010, Plant Physiology.

[26]  Wenfeng Li,et al.  Complementary Proteome and Transcriptome Profiling in Phosphate-deficient Arabidopsis Roots Reveals Multiple Levels of Gene Regulation* , 2012, Molecular & Cellular Proteomics.

[27]  Tomás C. Moyano,et al.  Integrated RNA-seq and sRNA-seq analysis identifies novel nitrate-responsive genes in Arabidopsis thaliana roots , 2013, BMC Genomics.

[28]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.

[29]  Gabriel Krouk,et al.  A system biology approach highlights a hormonal enhancer effect on regulation of genes in a nitrate responsive "biomodule" , 2009, BMC Systems Biology.

[30]  B. Tian,et al.  RNA‐Seq methods for transcriptome analysis , 2017, Wiley interdisciplinary reviews. RNA.

[31]  V. Malik RNA sequencing as a tool for understanding biological complexity of abiotic stress in plants , 2015, Journal of Plant Biochemistry and Biotechnology.

[32]  S. Rasmussen,et al.  Transcriptome Responses to Combinations of Stresses in Arabidopsis1[W][OA] , 2013, Plant Physiology.

[33]  Hairong Wei,et al.  Designing microarray and RNA-Seq experiments for greater systems biology discovery in modern plant genomics. , 2014, Molecular plant.

[34]  Steven L Salzberg,et al.  HISAT: a fast spliced aligner with low memory requirements , 2015, Nature Methods.

[35]  Stephen A Ramsey,et al.  Reverse enGENEering of Regulatory Networks from Big Data: A Roadmap for Biologists , 2015, Bioinformatics and biology insights.

[36]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[37]  A. Loraine,et al.  Transcriptional Coordination of the Metabolic Network in Arabidopsis1[W][OA] , 2006, Plant Physiology.

[38]  Robert Petryszak,et al.  ArrayExpress update—simplifying data submissions , 2014, Nucleic Acids Res..

[39]  Siobhan M Brady,et al.  Systems approaches to identifying gene regulatory networks in plants. , 2008, Annual review of cell and developmental biology.

[40]  Michael L. Creech,et al.  Integration of biological networks and gene expression data using Cytoscape , 2007, Nature Protocols.

[41]  Kyongbum Lee,et al.  An algorithm for modularity analysis of directed and weighted biological networks based on edge-betweenness centrality , 2006, Bioinform..

[42]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[43]  Rodrigo A. Gutiérrez,et al.  VirtualPlant: A Software Platform to Support Systems Biology Research1[W][OA] , 2009, Plant Physiology.

[44]  Björn Usadel,et al.  The plant transcriptome—from integrating observations to models , 2013, Front. Plant Sci..

[45]  Michael Kohl,et al.  Cytoscape: software for visualization and analysis of biological networks. , 2011, Methods in molecular biology.

[46]  A. Weber Discovering New Biology through Sequencing of RNA1 , 2015, Plant Physiology.

[47]  G. Coruzzi,et al.  Nitrate-responsive miR393/AFB3 regulatory module controls root system architecture in Arabidopsis thaliana , 2010, Proceedings of the National Academy of Sciences.

[48]  W. Shi,et al.  The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote , 2013, Nucleic acids research.

[49]  Ning Leng,et al.  EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments , 2013, Bioinform..

[50]  Francesca Chiaromonte,et al.  Qualitative network models and genome-wide expression data define carbon/nitrogen-responsive molecular machines in Arabidopsis , 2007, Genome Biology.

[51]  Gary D. Bader,et al.  clusterMaker: a multi-algorithm clustering plugin for Cytoscape , 2011, BMC Bioinformatics.

[52]  Daniel J. Gaffney,et al.  A survey of best practices for RNA-seq data analysis , 2016, Genome Biology.

[53]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[54]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..