Reconstructing differentially co-expressed gene modules and regulatory networks of soybean cells

BackgroundCurrent experimental evidence indicates that functionally related genes show coordinated expression in order to perform their cellular functions. In this way, the cell transcriptional machinery can respond optimally to internal or external stimuli. This provides a research opportunity to identify and study co-expressed gene modules whose transcription is controlled by shared gene regulatory networks.ResultsWe developed and integrated a set of computational methods of differential gene expression analysis, gene clustering, gene network inference, gene function prediction, and DNA motif identification to automatically identify differentially co-expressed gene modules, reconstruct their regulatory networks, and validate their correctness. We tested the methods using microarray data derived from soybean cells grown under various stress conditions. Our methods were able to identify 42 coherent gene modules within which average gene expression correlation coefficients are greater than 0.8 and reconstruct their putative regulatory networks. A total of 32 modules and their regulatory networks were further validated by the coherence of predicted gene functions and the consistency of putative transcription factor binding motifs. Approximately half of the 32 modules were partially supported by the literature, which demonstrates that the bioinformatic methods used can help elucidate the molecular responses of soybean cells upon various environmental stresses.ConclusionsThe bioinformatics methods and genome-wide data sources for gene expression, clustering, regulation, and function analysis were integrated seamlessly into one modular protocol to systematically analyze and infer modules and networks from only differential expression genes in soybean cells grown under stress conditions. Our approach appears to effectively reduce the complexity of the problem, and is sufficiently robust and accurate to generate a rather complete and detailed view of putative soybean gene transcription logic potentially underlying the responses to the various environmental challenges. The same automated method can also be applied to reconstruct differentially co-expressed gene modules and their regulatory networks from gene expression data of any other transcriptome.

[1]  N. Saibo,et al.  Transcription Factors and Regulation of Photosynthetic and Related Metabolism under Environmental Stresses , 2022 .

[2]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[3]  E. Meyerowitz,et al.  The AP2/EREBP family of plant transcription factors. , 1998, Biological chemistry.

[4]  Peter Delves,et al.  Encyclopedia of life sciences , 2009 .

[5]  T. Stein,et al.  Azf1p is a nuclear-localized zinc-finger protein that is preferentially expressed under non-fermentative growth conditions in Saccharomyces cerevisiae , 1998, Current Genetics.

[6]  Cyrus Chothia,et al.  SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments , 2002, Nucleic Acids Res..

[7]  William Stafford Noble,et al.  Quantifying similarity between motifs , 2007, Genome Biology.

[8]  Bor-Sen Chen,et al.  Constructing gene regulatory networks for long term photosynthetic light acclimation in Arabidopsis thaliana , 2011, BMC Bioinformatics.

[9]  T. Sakurai,et al.  Genome sequence of the palaeopolyploid soybean , 2010, Nature.

[10]  R. Oelmüller,et al.  The Evolutionarily Conserved Tetratrico Peptide Repeat Protein Pale Yellow Green7 Is Required for Photosystem I Accumulation in Arabidopsis and Copurifies with the Complex1 , 2006, Plant Physiology.

[11]  Gary Stacey,et al.  A Protein Domain Co-Occurrence Network Approach for Predicting Protein Function and Inferring Species Phylogeny , 2011, PloS one.

[12]  N. Sauer,et al.  LATE, a C(2)H(2) zinc-finger protein that acts as floral repressor. , 2011, The Plant journal : for cell and molecular biology.

[13]  J. Friml,et al.  Auxin signaling , 2006, Journal of Cell Science.

[14]  Christian Kappel,et al.  Recent advances in the transcriptional regulation of the flavonoid biosynthetic pathway. , 2011, Journal of experimental botany.

[15]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[16]  W. Heideman,et al.  The Function and Properties of the Azf1 Transcriptional Regulator Change with Growth Conditions in Saccharomyces cerevisiae , 2006, Eukaryotic Cell.

[17]  Ron Shamir,et al.  Computational expansion of genetic networks , 2001, ISMB.

[18]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[19]  Jing Li,et al.  Regulatory module network of basic/helix-loop-helix transcription factors in mouse brain , 2007, Genome Biology.

[20]  Amos Tanay,et al.  Minreg: Inferring an active regulator set , 2002, ISMB.

[21]  Léon Personnaz,et al.  Enrichment or depletion of a GO category within a class of genes: which test? , 2007, Bioinform..

[22]  J. Henkel Soy. Health claims for soy protein, questions about other components. , 2000, FDA consumer.

[23]  Kimberly Van Auken,et al.  WormBase: a multi-species resource for nematode biology and genomics , 2004, Nucleic Acids Res..

[24]  G. Michaelis,et al.  A new nuclear suppressor system for a mitochondrial RNA polymerase mutant identifies an unusual zinc‐finger protein and a polyglutamine domain protein in Saccharomyces cerevisiae , 1994, Yeast.

[25]  Juan Miguel García-Gómez,et al.  BIOINFORMATICS APPLICATIONS NOTE Sequence analysis Manipulation of FASTQ data with Galaxy , 2005 .

[26]  Jianlin Cheng,et al.  SoyDB: a knowledge database of soybean transcription factors , 2010, BMC Plant Biology.

[27]  Chuan Yi Tang,et al.  A 2.|E|-Bit Distributed Algorithm for the Directed Euler Trail Problem , 1993, Inf. Process. Lett..

[28]  N. Sauer,et al.  LATE, a C2H2 zinc-finger protein that acts as floral repressor: C2H2 zinc-finger protein involved in flowering , 2011 .

[29]  Damian Szklarczyk,et al.  The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored , 2010, Nucleic Acids Res..

[30]  Hui Chen,et al.  Whole genome co-expression analysis of soybean cytochrome P450 genes identifies nodulation-specific P450 monooxygenases , 2010, BMC Plant Biology.

[31]  E. Meyerowitz,et al.  Cell-type specific analysis of translating RNAs in developing flowers reveals new levels of control , 2010, Molecular Systems Biology.

[32]  Kazuo Shinozaki,et al.  Isolation and Functional Analysis of Arabidopsis Stress-Inducible NAC Transcription Factors That Bind to a Drought-Responsive cis-Element in the early responsive to dehydration stress 1 Promoterw⃞ , 2004, The Plant Cell Online.

[33]  Kathleen Marchal,et al.  Module networks revisited: computational assessment and prioritization of model predictions , 2009, Bioinform..

[34]  Iain S. Donnison,et al.  Identification of genes involved in cell wall biogenesis in grasses by differential gene expression profiling of elongating and non-elongating maize internodes , 2011, Journal of experimental botany.

[35]  Andrew J. Bulpitt,et al.  From gene expression to gene regulatory networks in Arabidopsis thaliana , 2009, BMC Systems Biology.

[36]  David E. Irwin,et al.  Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior , 2011, 2011 31st International Conference on Distributed Computing Systems Workshops.

[37]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[38]  Trupti Joshi,et al.  Soybean Knowledge Base (SoyKB): a web resource for soybean translational genomics , 2012, BMC Genomics.

[39]  M. Shao,et al.  Calcium as a versatile plant signal transducer under soil water stress. , 2008, BioEssays : news and reviews in molecular, cellular and developmental biology.

[40]  K. Shinozaki,et al.  An Arabidopsis myb homolog is induced by dehydration stress and its gene product binds to the conserved MYB recognition sequence. , 1993, The Plant cell.

[41]  Jianlin Cheng,et al.  MULTICOM: a multi-level combination approach to protein structure prediction and its assessments in CASP8 , 2010, Bioinform..

[42]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[43]  Tanya Z. Berardini,et al.  The Arabidopsis Information Resource (TAIR): gene structure and function annotation , 2007, Nucleic Acids Res..

[44]  Eve Syrkin Wurtele,et al.  Articulation of three core metabolic processes in Arabidopsis: Fatty acid biosynthesis, leucine catabolism and starch metabolism , 2008, BMC Plant Biology.

[45]  Nir Friedman,et al.  Inferring subnetworks from perturbed expression profiles , 2001, ISMB.

[46]  Martti Juhola On Machine Learning Classification of Otoneurological Data , 2008, MIE.

[47]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[48]  B. Han,et al.  A Chinese fermented soybean food. , 2001, International journal of food microbiology.