Comparative study of RNA-seq- and Microarray-derived coexpression networks in Arabidopsis thaliana

MOTIVATION Coexpression networks are data-derived representations of genes behaving in a similar way across tissues and experimental conditions. They have been used for hypothesis generation and guilt-by-association approaches for inferring functions of previously unknown genes. So far, the main platform for expression data has been DNA microarrays; however, the recent development of RNA-seq allows for higher accuracy and coverage of transcript populations. It is therefore important to assess the potential for biological investigation of coexpression networks derived from this novel technique in a condition-independent dataset. RESULTS We collected 65 publicly available Illumina RNA-seq high quality Arabidopsis thaliana samples and generated Pearson correlation coexpression networks. These networks were then compared with those derived from analogous microarray data. We show how Variance-Stabilizing Transformed (VST) RNA-seq data samples are the most similar to microarray ones, with respect to inter-sample variation, correlation coefficient distribution and network topological architecture. Microarray networks show a slightly higher score in biology-derived quality assessments such as overlap with the known protein-protein interaction network and edge ontological agreement. Different coexpression network centralities are investigated; in particular, we show how betweenness centrality is generally a positive marker for essential genes in A.thaliana, regardless of the platform originating the data. In the end, we focus on a specific gene network case, showing that although microarray data seem more suited for gene network reverse engineering, RNA-seq offers the great advantage of extending coexpression analyses to the entire transcriptome.

[1]  Debra Mohnen,et al.  Functional identification of an Arabidopsis pectin biosynthetic homogalacturonan galacturonosyltransferase. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[2]  R. Lunsford,et al.  Rational identification of new antibacterial drug targets that are essential for viability using a genomics-based approach. , 2002, Pharmacology & therapeutics.

[3]  Holger Schwender,et al.  Bibliography Reverse Engineering Genetic Networks Using the Genenet Package , 2006 .

[4]  A. Bonner,et al.  Genome-wide network model capturing seed germination reveals coordinated regulation of plant cellular phase transitions , 2011, Proceedings of the National Academy of Sciences.

[5]  Staffan Persson,et al.  Co-expression tools for plant biology: opportunities for hypothesis generation and caveats. , 2009, Plant, cell & environment.

[6]  Andrea Ciliberto,et al.  Low duplicability and network fragility of cancer genes. , 2008, Trends in genetics : TIG.

[7]  B. Usadel,et al.  RHM2 Is Involved in Mucilage Pectin Synthesis and Is Required for the Development of the Seed Coat in Arabidopsis , 2004, Plant Physiology.

[8]  Aldons J Lusis,et al.  Integrating global gene expression analysis and genetics. , 2008, Advances in genetics.

[9]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[10]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[11]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[12]  E. Mellerowicz,et al.  UDP-glucose pyrophosphorylase is not rate limiting, but is essential in Arabidopsis. , 2009, Plant & cell physiology.

[13]  Kathryn A. Ingle,et al.  Reverse Engineering , 1996, Springer US.

[14]  Feng Luo,et al.  Constructing gene co-expression networks and predicting functions of unknown genes by random matrix theory , 2007, BMC Bioinformatics.

[15]  Joachim Selbig,et al.  Robin: An Intuitive Wizard Application for R-Based Expression Microarray Quality Assessment and Analysis1[W][OA] , 2010, Plant Physiology.

[16]  Jacques van Helden,et al.  Network Analysis Tools: from biological networks to clusters and pathways , 2008, Nature Protocols.

[17]  Korbinian Strimmer,et al.  From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data , 2007, BMC Systems Biology.

[18]  J R Beck,et al.  The use of relative operating characteristic (ROC) curves in test performance evaluation. , 1986, Archives of pathology & laboratory medicine.

[19]  Kai Wang,et al.  Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks , 2007, ISMB/ECCB.

[20]  Staffan Persson,et al.  Identification of genes required for cellulose synthesis by regression analysis of public microarray data sets. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Marcelo M. Brandão,et al.  AtPIN: Arabidopsis thaliana Protein Interaction Network , 2009, BMC Bioinformatics.

[22]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[23]  Lior Pachter,et al.  Identification of novel transcripts in annotated genomes using RNA-Seq , 2011, Bioinform..

[24]  Daniel Bottomly,et al.  Utilizing RNA-Seq data for de novo coexpression network inference , 2012, Bioinform..

[25]  Fang-fang Fu,et al.  Coexpression Analysis Identifies Rice Starch Regulator1, a Rice AP2/EREBP Family Transcription Factor, as a Novel Rice Starch Biosynthesis Regulator1[W][OA] , 2010, Plant Physiology.

[26]  Carsten O. Daub,et al.  Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data , 2004, BMC Bioinformatics.

[27]  Olga Brazhnik,et al.  The Arabidopsis SeedGenes Project , 2003, Nucleic Acids Res..

[28]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[29]  Tanya Z. Berardini,et al.  The Arabidopsis Information Resource (TAIR): gene structure and function annotation , 2007, Nucleic Acids Res..

[30]  Stefan Wuchty,et al.  Interaction and domain networks of yeast , 2002, Proteomics.

[31]  Björn Usadel,et al.  LASSO modeling of the Arabidopsis thaliana seed/seedling transcriptome: a model case for detection of novel mucilage and pectin metabolism genes. , 2012, Molecular bioSystems.

[32]  Lonnie R. Welch,et al.  AGRIS: the Arabidopsis Gene Regulatory Information Server, an update , 2010, Nucleic Acids Res..

[33]  G. Upton Fisher's Exact Test , 1992 .

[34]  Qin Ma,et al.  Genome-scale identification of cell-wall related genes in Arabidopsis based on co-expression network analysis , 2012, BMC Plant Biology.

[35]  Adam A. Margolin,et al.  Reverse engineering of regulatory networks in human B cells , 2005, Nature Genetics.

[36]  Christie S. Chang,et al.  The BioGRID interaction database: 2013 update , 2012, Nucleic Acids Res..

[37]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[38]  Narendra Tuteja,et al.  Signaling through MAP kinase networks in plants. , 2006, Archives of biochemistry and biophysics.

[39]  Klaas Vandepoele,et al.  Unraveling Transcriptional Control in Arabidopsis Using cis-Regulatory Elements and Coexpression Networks1[C][W] , 2009, Plant Physiology.

[40]  S. Rhee,et al.  MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. , 2004, The Plant journal : for cell and molecular biology.

[41]  Kengo Kinoshita,et al.  COXPRESdb: a database of comparative gene coexpression networks of eleven species for mammals , 2012, Nucleic Acids Res..

[42]  E. Koonin,et al.  Conservation and coevolution in the scale-free human gene coexpression network. , 2004, Molecular biology and evolution.

[43]  A. Loraine,et al.  Assembly of an Interactive Correlation Network for the Arabidopsis Genome Using a Novel Heuristic Clustering Algorithm1[W] , 2009, Plant Physiology.

[44]  P. Bork,et al.  Evolution of biomolecular networks — lessons from metabolic and protein interactions , 2009, Nature Reviews Molecular Cell Biology.

[45]  S. Cole Comparative mycobacterial genomics as a tool for drug target and antigen discovery , 2002, European Respiratory Journal.

[46]  Claudio Altafini,et al.  Discerning static and causal interactions in genome-wide reverse engineering problems , 2008, Bioinform..

[47]  Björn Usadel,et al.  Algorithm-driven Artifacts in median polish summarization of Microarray data , 2010, BMC Bioinformatics.

[48]  H. Kitano Systems Biology: A Brief Overview , 2002, Science.

[49]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[50]  Trey Ideker,et al.  Cytoscape 2.8: new features for data integration and network visualization , 2010, Bioinform..

[51]  Jehyuk Lee,et al.  Digital RNA Allelotyping Reveals Tissue-specific and Allele-specific Gene Expression in Human , 2009, Nature Methods.

[52]  F. Schreiber,et al.  Centrality Analysis Methods for Biological Networks and Their Application to Gene Regulatory Networks , 2008, Gene regulation and systems biology.

[53]  R. Myers,et al.  Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data , 2005, Nucleic acids research.

[54]  Peter D. Karp,et al.  The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases , 2007, Nucleic Acids Res..

[55]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2008 update , 2008, Nucleic Acids Res..

[56]  Yi Pan,et al.  A local average connectivity-based method for identifying essential proteins from the network level , 2011, Comput. Biol. Chem..

[57]  Antonio Reverter,et al.  Combining partial correlation and an information theory approach to the reversed engineering of gene co-expression networks , 2008, Bioinform..

[58]  W. Huber,et al.  Differential expression analysis for sequence count data , 2010 .

[59]  Marcel H. Schulz,et al.  Prediction of alternative isoforms from exon expression levels in RNA-Seq experiments , 2010, Nucleic acids research.

[60]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[61]  Sarah A Teichmann,et al.  Conservation of gene co-regulation in prokaryotes and eukaryotes. , 2002, Trends in biotechnology.

[62]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[63]  Wei-Min Liu,et al.  Robust estimators for expression analysis , 2002, Bioinform..

[64]  Sarah E. London,et al.  RNA-seq transcriptome analysis of male and female zebra finch cell lines. , 2012, Genomics.

[65]  K. Vandepoele,et al.  Comparative co-expression analysis in plant biology. , 2012, Plant, cell & environment.

[66]  Atul J. Butte,et al.  Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks , 2005, BMC Bioinformatics.

[67]  Patrik D'haeseleer,et al.  Genetic network inference: from co-expression clustering to reverse engineering , 2000, Bioinform..

[68]  R. Fisher FREQUENCY DISTRIBUTION OF THE VALUES OF THE CORRELATION COEFFIENTS IN SAMPLES FROM AN INDEFINITELY LARGE POPU;ATION , 1915 .

[69]  Alberto de la Fuente,et al.  Discovery of meaningful associations in genomic data using partial correlation coefficients , 2004, Bioinform..

[70]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[71]  A. Hartemink Reverse engineering gene regulatory networks , 2005, Nature Biotechnology.

[72]  Zoran Nikoloski,et al.  The Choice between MapMan and Gene Ontology for Automated Gene Function Prediction in Plant Science , 2012, Front. Gene..

[73]  G. Haughn,et al.  Arabidopsis Seed Coat Mucilage is a Specialized Cell Wall that Can be Used as a Model for Genetic Analysis of Plant Cell Wall Structure and Function , 2012, Front. Plant Sci..

[74]  Hideaki Sugawara,et al.  The Sequence Read Archive , 2010, Nucleic Acids Res..

[75]  J. Selbig,et al.  SLocX: Predicting Subcellular Localization of Arabidopsis Proteins Leveraging Gene Expression Data , 2011, Front. Plant Sci..

[76]  D. Ingber,et al.  High-Betweenness Proteins in the Yeast Protein Interaction Network , 2005, Journal of biomedicine & biotechnology.