Evaluation of Gene Association Methods for Coexpression Network Construction and Biological Knowledge Discovery

Background Constructing coexpression networks and performing network analysis using large-scale gene expression data sets is an effective way to uncover new biological knowledge; however, the methods used for gene association in constructing these coexpression networks have not been thoroughly evaluated. Since different methods lead to structurally different coexpression networks and provide different information, selecting the optimal gene association method is critical. Methods and Results In this study, we compared eight gene association methods – Spearman rank correlation, Weighted Rank Correlation, Kendall, Hoeffding's D measure, Theil-Sen, Rank Theil-Sen, Distance Covariance, and Pearson – and focused on their true knowledge discovery rates in associating pathway genes and construction coordination networks of regulatory genes. We also examined the behaviors of different methods to microarray data with different properties, and whether the biological processes affect the efficiency of different methods. Conclusions We found that the Spearman, Hoeffding and Kendall methods are effective in identifying coexpressed pathway genes, whereas the Theil-sen, Rank Theil-Sen, Spearman, and Weighted Rank methods perform well in identifying coordinated transcription factors that control the same biological processes and traits. Surprisingly, the widely used Pearson method is generally less efficient, and so is the Distance Covariance method that can find gene pairs of multiple relationships. Some analyses we did clearly show Pearson and Distance Covariance methods have distinct behaviors as compared to all other six methods. The efficiencies of different methods vary with the data properties to some degree and are largely contingent upon the biological processes, which necessitates the pre-analysis to identify the best performing method for gene association and coexpression network construction.

[1]  Steven Clarke,et al.  PRMT8, a New Membrane-bound Tissue-specific Member of the Protein Arginine Methyltransferase Family* , 2005, Journal of Biological Chemistry.

[2]  Nian Shong Chok PEARSON'S VERSUS SPEARMAN'S AND KENDALL'S CORRELATION COEFFICIENTS FOR CONTINUOUS DATA , 2010 .

[3]  J. D. Engel,et al.  GATA-2 and GATA-3 regulate trophoblast-specific gene expression in vivo. , 1997, Development.

[4]  Yoshinori Fujiyoshi,et al.  Proteomic analysis revealed a novel synaptic proline-rich membrane protein (PRR7) associated with PSD-95 and NMDA receptor. , 2005, Biochemical and biophysical research communications.

[5]  Andrzej Kudlicki,et al.  High-resolution timing of cell cycle-regulated gene expression , 2007, Proceedings of the National Academy of Sciences.

[6]  R. Zhong,et al.  A Battery of Transcription Factors Involved in the Regulation of Secondary Cell Wall Biosynthesis in Arabidopsis , 2008, The Plant Cell Online.

[7]  Rosangela Sozzani,et al.  Two cell-cycle regulated SET-domain proteins interact with proliferating cell nuclear antigen (PCNA) in Arabidopsis. , 2006, The Plant journal : for cell and molecular biology.

[8]  Hongling Jiang,et al.  Arabidopsis Tyrosylprotein Sulfotransferase Acts in the Auxin/PLETHORA Pathway in Regulating Postembryonic Maintenance of the Root Stem Cell Niche[W][OA] , 2010, Plant Cell.

[9]  L. Dolan,et al.  A basic helix-loop-helix transcription factor controls cell growth and size in root hairs , 2010, Nature Genetics.

[10]  R. Zhong,et al.  The MYB46 Transcription Factor Is a Direct Target of SND1 and Regulates Secondary Wall Biosynthesis in Arabidopsis , 2007, The Plant Cell Online.

[11]  Hang Zhang,et al.  TF-Cluster: A pipeline for identifying functionally coordinated transcription factors via network decomposition of the shared coexpression connectivity matrix (SCCM) , 2011, BMC Systems Biology.

[12]  C. Spearman General intelligence Objectively Determined and Measured , 1904 .

[13]  Peter Doerner,et al.  Arabidopsis TCP20 links regulation of growth and cell division control pathways. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Alan Marchant,et al.  Insight into the early steps of root hair formation revealed by the procuste1 cellulose synthase mutant of Arabidopsis thaliana , 2007, BMC Plant Biology.

[15]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[16]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[17]  Thomas Lufkin,et al.  Zfp206 Is a Transcription Factor That Controls Pluripotency of Embryonic Stem Cells , 2007, Stem cells.

[18]  Wm. R. Wright General Intelligence, Objectively Determined and Measured. , 1905 .

[19]  R. W. Blackmor,et al.  A Course in Theoretical Statistics , 1970 .

[20]  Tetsuro Mimura,et al.  Transcription switches for protoxylem and metaxylem vessel formation. , 2005, Genes & development.

[21]  Mariusz Kowalczyk,et al.  An Auxin Gradient and Maximum in the Arabidopsis Root Apex Shown by High-Resolution Cell-Specific Analysis of IAA Distribution and Synthesis[W] , 2009, The Plant Cell Online.

[22]  K. Shinozaki,et al.  Two Transcription Factors, DREB1 and DREB2, with an EREBP/AP2 DNA Binding Domain Separate Two Cellular Signal Transduction Pathways in Drought- and Low-Temperature-Responsive Gene Expression, Respectively, in Arabidopsis , 1998, Plant Cell.

[23]  R. Ferl,et al.  Identification and characterization of GIP1, an Arabidopsis thaliana protein that enhances the DNA binding affinity and reduces the oligomeric state of G-box binding factors , 2005, Cell Research.

[24]  T. Elston,et al.  Stochasticity in gene expression: from theories to phenotypes , 2005, Nature Reviews Genetics.

[25]  B. Shuai,et al.  The Lateral Organ Boundaries Gene Defines a Novel, Plant-Specific Gene Family1 , 2002, Plant Physiology.

[26]  S. J. Devlin,et al.  Robust estimation and outlier detection with correlation coefficients , 1975 .

[27]  Hong Wang,et al.  Gene Expression Profiles during the Initial Phase of Salt Stress in Rice , 2001, Plant Cell.

[28]  M. King,et al.  Mutation in transcription factor POU4F3 associated with inherited progressive hearing loss in humans. , 1998, Science.

[29]  Christopher D Town,et al.  Development and evaluation of an Arabidopsis whole genome Affymetrix probe array. , 2004, The Plant journal : for cell and molecular biology.

[30]  Richard A Young,et al.  Control of the Embryonic Stem Cell State , 2011, Cell.

[31]  S. Horvath,et al.  A General Framework for Weighted Gene Co-Expression Network Analysis , 2005, Statistical applications in genetics and molecular biology.

[32]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[33]  W. Hoeffding A Non-Parametric Test of Independence , 1948 .

[34]  Joaquim F. Pinto da Costa,et al.  LIMIT DISTRIBUTION FOR THE WEIGHTED RANK CORRELATION COEFFICIENT, rW , 2006 .

[35]  Dominique C Bergmann,et al.  Regulation of the Arabidopsis root vascular initial population by LONESOME HIGHWAY , 2007, Development.

[36]  Luciano da Fontoura Costa,et al.  Gene Expression Noise in Spatial Patterning: hunchback Promoter Structure Affects Noise Amplitude and Distribution in Drosophila Segmentation , 2011, PLoS Comput. Biol..

[37]  S. Bamforth,et al.  Transcriptional Coactivator Cited2 Induces Bmi1 and Mel18 and Controls Fibroblast Proliferation via Ink4a/ARF , 2003, Molecular and Cellular Biology.

[38]  Julie A. Dickerson,et al.  Arabidopsis gene co-expression network and its functional modules , 2009, BMC Bioinformatics.

[39]  K. Shinozaki,et al.  AREB1, AREB2, and ABF3 are master transcription factors that cooperatively regulate ABRE-dependent ABA signaling involved in drought stress tolerance and require ABA for full activation. , 2010, The Plant journal : for cell and molecular biology.

[40]  Tariq Enver,et al.  Cited2 Is an Essential Regulator of Adult Hematopoietic Stem Cells , 2009, Cell stem cell.

[41]  Julian I Schroeder,et al.  Microarray Expression Analyses of Arabidopsis Guard Cells and Isolation of a Recessive Abscisic Acid Hypersensitive Protein Phosphatase 2C Mutant Online version contains Web-only data. , 2004, The Plant Cell Online.

[42]  T. Zwaka,et al.  Breathing chromatin in pluripotent stem cells. , 2006, Developmental cell.

[43]  Ren-He Xu,et al.  In vitro induction of trophoblast from human embryonic stem cells. , 2006, Methods in molecular medicine.

[44]  S. Cordes,et al.  Cdx1 refines positional identity of the vertebrate hindbrain by directly repressing Mafb expression , 2011, Development.

[45]  Roger Newson,et al.  Parameters behind “Nonparametric” Statistics: Kendall's tau, Somers’ D and Median Differences , 2002 .

[46]  Jianzhi Zhang,et al.  Impact of gene expression noise on organismal fitness and the efficacy of natural selection , 2011, Proceedings of the National Academy of Sciences.

[47]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[48]  A. Loraine,et al.  Transcriptional Coordination of the Metabolic Network in Arabidopsis1[W][OA] , 2006, Plant Physiology.

[49]  R. Wilcox A Note on the Theil-Sen Regression Estimator When the Regressor Is Random and the Error Term Is Heteroscedastic , 1998 .

[50]  Y. Tu,et al.  Quantitative noise analysis for gene expression microarray experiments , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[51]  M. Koch,et al.  Human transcription factor SLUG: mutation analysis in patients with neural tube defects and identification of a missense mutation (D119E) in the Slug subfamily-defining region. , 1999, Mutation research.

[52]  Hanxiang Peng,et al.  Consistency and asymptotic distribution of the Theil–Sen estimator , 2008 .

[53]  V. Shulaev,et al.  When Defense Pathways Collide. The Response of Arabidopsis to a Combination of Drought and Heat Stress1[w] , 2004, Plant Physiology.

[54]  Peter Engström,et al.  The homeobox genes ATHB12 and ATHB7encode potential regulators of growth in response to water deficit in Arabidopsis , 2004, Plant Molecular Biology.

[55]  X. Chen,et al.  The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells , 2006, Nature Genetics.

[56]  R. R. Samaha,et al.  Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. , 2000, Science.

[57]  João Ricardo Sato,et al.  Comparing Pearson, Spearman and Hoeffding's d Measure for Gene Expression Association Analysis , 2009, J. Bioinform. Comput. Biol..

[58]  Michael P. H. Stumpf,et al.  Nonidentifiability of the Source of Intrinsic Noise in Gene Expression from Single-Burst Data , 2008, PLoS Comput. Biol..

[59]  Han Woo Lee,et al.  LBD18/ASL20 Regulates Lateral Root Formation in Combination with LBD16/ASL18 Downstream of ARF7 and ARF19 in Arabidopsis1[C][W][OA] , 2009, Plant Physiology.

[60]  I. V. Orekhova,et al.  Control of time-dependent biological processes by temporally patterned input. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[61]  Kazuaki Ohashi [HANABA TARANU, a GATA transcription factor which affects shoot apical meristem development]. , 2006, Seikagaku. The Journal of Japanese Biochemical Society.

[62]  Berend Snel,et al.  SOMBRERO, BEARSKIN1, and BEARSKIN2 Regulate Root Cap Maturation in Arabidopsis[C][W] , 2010, Plant Cell.

[63]  G. Coupland,et al.  A Dissociation insertion causes a semidominant mutation that increases expression of TINY, an Arabidopsis gene related to APETALA2. , 1996, The Plant cell.

[64]  Caroline Smith,et al.  Control of final seed and organ size by the DA1 gene family in Arabidopsis thaliana. , 2008, Genes & development.

[65]  Tong Wang,et al.  TF-finder: A software package for identifying transcription factors involved in biological processes using microarray data and existing knowledge base , 2010, BMC Bioinformatics.

[66]  J. E. García,et al.  A non-parametric test of independence ∗ , 2011 .

[67]  Paul Linstead,et al.  An Ancient Mechanism Controls the Development of Cells with a Rooting Function in Land Plants , 2007, Science.

[68]  Staffan Persson,et al.  Identification of genes required for cellulose synthesis by regression analysis of public microarray data sets. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[69]  Olivier Pourquié,et al.  Oscillating signaling pathways during embryonic development. , 2008, Current opinion in cell biology.

[70]  Jim Haseloff,et al.  The NAC domain transcription factors FEZ and SOMBRERO control the orientation of cell division plane in Arabidopsis root stem cells. , 2008, Developmental cell.

[71]  M. Schmid,et al.  MONOPTEROS controls embryonic root initiation by regulating a mobile transcription factor , 2010, Nature.

[72]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[73]  Megan F. Cole,et al.  Core Transcriptional Regulatory Circuitry in Human Embryonic Stem Cells , 2005, Cell.

[74]  J. Kim,et al.  A transcriptional coactivator, AtGIF1, is involved in regulating leaf growth and morphology in Arabidopsis. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[75]  J. Dopazo,et al.  Assessing the Biological Significance of Gene Expression Signatures and Co-Expression Modules by Studying Their Network Properties , 2011, PloS one.

[76]  Johanna S. Hardin,et al.  A robust measure of correlation between two genes on a microarray , 2007, BMC Bioinformatics.

[77]  J. Kim,et al.  The AtGRF family of putative transcription factors is involved in leaf and cotyledon growth in Arabidopsis. , 2003, The Plant journal : for cell and molecular biology.

[78]  R. A. van den Berg,et al.  Identification of modules in Aspergillus niger by gene co-expression network analysis. , 2010, Fungal genetics and biology : FG & B.

[79]  Hiroyuki Aburatani,et al.  Topological and functional discovery in a gene coexpression meta-network of gastric cancer. , 2006, Cancer research.

[80]  Megan Hitchins,et al.  Differential expression of the embryo/cancer gene ECSA(DPPA2), the cancer/testis gene BORIS and the pluripotency structural gene OCT4, in human preimplantation development. , 2008, Molecular human reproduction.

[81]  J. Rodgers,et al.  Thirteen ways to look at the correlation coefficient , 1988 .

[82]  Rosangela Sozzani,et al.  The E2FD/DEL2 factor is a component of a regulatory network controlling cell proliferation and development in Arabidopsis , 2010, Plant Molecular Biology.

[83]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[84]  J. Costa,et al.  A WEIGHTED RANK MEASURE OF CORRELATION , 2005 .

[85]  Seth Blackshaw,et al.  Pias3-Dependent SUMOylation Directs Rod Photoreceptor Development , 2009, Neuron.

[86]  H. Ling,et al.  AtbHLH29 of Arabidopsis thaliana is a functional ortholog of tomato FER involved in controlling iron acquisition in strategy I plants , 2005, Cell Research.

[87]  L. Stanton,et al.  Zfp206, Oct4, and Sox2 Are Integrated Components of a Transcriptional Regulatory Network in Embryonic Stem Cells* , 2009, The Journal of Biological Chemistry.

[88]  Jong Hoon Park,et al.  Induction of a homeodomain-leucine zipper gene by auxin is inhibited by cytokinin in Arabidopsis roots. , 2004, Biochemical and biophysical research communications.