SFGD: a comprehensive platform for mining functional information from soybean transcriptome data and its use in identifying acyl-lipid metabolism pathways

BackgroundSoybean (Glycine max L.) is one of the world’s most important leguminous crops producing high-quality protein and oil. Increasing the relative oil concentration in soybean seeds is many researchers’ goal, but a complete analysis platform of functional annotation for the genes involved in the soybean acyl-lipid pathway is still lacking. Following the success of soybean whole-genome sequencing, functional annotation has become a major challenge for the scientific community. Whole-genome transcriptome analysis is a powerful way to predict genes with biological functions. It is essential to build a comprehensive analysis platform for integrating soybean whole-genome sequencing data, the available transcriptome data and protein information. This platform could also be used to identify acyl-lipid metabolism pathways.DescriptionIn this study, we describe our construction of the Soybean Functional Genomics Database (SFGD) using Generic Genome Browser (Gbrowse) as the core platform. We integrated microarray expression profiling with 255 samples from 14 groups’ experiments and mRNA-seq data with 30 samples from four groups’ experiments, including spatial and temporal transcriptome data for different soybean development stages and environmental stresses. The SFGD includes a gene co-expression regulatory network containing 23,267 genes and 1873 miRNA-target pairs, and a group of acyl-lipid pathways containing 221 enzymes and more than 1550 genes. The SFGD also provides some key analysis tools, i.e. BLAST search, expression pattern search and cis-element significance analysis, as well as gene ontology information search and single nucleotide polymorphism display.ConclusionThe SFGD is a comprehensive database integrating genome and transcriptome data, and also for soybean acyl-lipid metabolism pathways. It provides useful toolboxes for biologists to improve the accuracy and robustness of soybean functional genomics analysis, further improving understanding of gene regulatory networks for effective crop improvement. The SFGD is publically accessible at http://bioinformatics.cau.edu.cn/SFGD/, with all data available for downloading.

[1]  E. Lam,et al.  Tetramer of a 21-base pair synthetic element confers seed expression and transcriptional enhancement in response to water stress and abscisic acid. , 1991, The Journal of biological chemistry.

[2]  T. Thomas Gene expression during plant embryogenesis and germination: an overview. , 1993, The Plant cell.

[3]  P. Covello,et al.  Alteration of Seed Fatty Acid Composition by an Ethyl Methanesulfonate-Induced Mutation in Arabidopsis thaliana Affecting Diacylglycerol Acyltransferase Activity , 1995, Plant physiology.

[4]  3-Oxoacyl-[acyl-carrier-protein] synthase , 1996 .

[5]  Yoshihiro Ugawa,et al.  Plant cis-acting regulatory DNA elements (PLACE) database: 1999 , 1999, Nucleic Acids Res..

[6]  T. Hymowitz,et al.  Soybean genetic resources and crop improvement , 1999 .

[7]  Benning,et al.  The TAG1 locus of Arabidopsis encodes for a diacylglycerol acyltransferase. , 1999, Plant physiology and biochemistry : PPB.

[8]  A. Kumar,et al.  The Arabidopsis thaliana TAG1 mutant has a mutation in a diacylglycerol acyltransferase gene. , 1999, The Plant journal : for cell and molecular biology.

[9]  S. Stymne,et al.  Phospholipid:diacylglycerol acyltransferase: an enzyme that catalyzes the acyl-CoA-independent formation of triacylglycerol in yeast and plants. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[10]  L. Nehlin,et al.  Transactivation of the Brassica napus napin promoter by ABI3 requires interaction of the conserved B2 and B3 domains of ABI3 with different cis-elements: B2 mediates activation through an ABRE, whereas B3 interacts with an RY/G-box. , 2000, The Plant journal : for cell and molecular biology.

[11]  Gregory C. Thornwall,et al.  The microarray explorer tool for data mining of cDNA microarrays: application for the mammary gland. , 2000, Nucleic acids research.

[12]  M. Caligiuri,et al.  Expression profiling reveals fundamental biological differences in acute myeloid leukemia with isolated trisomy 8 and normal cytogenetics. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  M. Webster,et al.  Application of cDNA microarrays to examine gene expression differences in schizophrenia , 2001, Brain Research Bulletin.

[14]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[15]  B. Reinhart,et al.  Prediction of Plant MicroRNA Targets , 2002, Cell.

[16]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[17]  Kathleen Marchal,et al.  PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences , 2002, Nucleic Acids Res..

[18]  K. Becker,et al.  Analysis of microarray data using Z score transformation. , 2003, The Journal of molecular diagnostics : JMD.

[19]  I. Ezcurra,et al.  Disruption of an overlapping E-box/ABRE motif abolished high transcription of the napA storage-protein promoter in transgenic Brassica napus seeds , 2004, Planta.

[20]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[21]  I. Ezcurra,et al.  Interaction between composite elements in the napA promoter: both the B-box ABA-responsive complex and the RY/G complex are necessary for seed-specific expression , 1999, Plant Molecular Biology.

[22]  A. Danchin,et al.  Bmc Genomics , 2004 .

[23]  Nadim W. Alkharouf,et al.  SGMD: the Soybean Genomics and Microarray Database , 2004, Nucleic Acids Res..

[24]  T. Mockler,et al.  Interdependency of Brassinosteroid and Auxin Signaling in Arabidopsis , 2004, PLoS biology.

[25]  C. Benning,et al.  WRINKLED1 encodes an AP2/EREB domain protein involved in the control of storage compound biosynthesis in Arabidopsis. , 2004, The Plant journal : for cell and molecular biology.

[26]  Martin Schindler,et al.  AthaMap: an online resource for in silico transcription factor binding sites in the Arabidopsis thaliana genome , 2004, Nucleic Acids Res..

[27]  P. Busk,et al.  Regulation of abscisic acid-induced transcription , 1998, Plant Molecular Biology.

[28]  Tobias Dezulian,et al.  Conservation and divergence of microRNA families in plants , 2005, Genome Biology.

[29]  D. Pe’er Bayesian Network Analysis of Signaling Networks: A Primer , 2005, Science's STKE.

[30]  M. DasGupta,et al.  Dual DNA Binding Property of ABA insensitive 3 Like Factors Targeted to Promoters Responsive to ABA and Auxin , 2005, Plant Molecular Biology.

[31]  Baohong Zhang,et al.  Identification and characterization of new plant microRNAs using EST analysis , 2005, Cell Research.

[32]  Adam M. Gustafson,et al.  microRNA-Directed Phasing during Trans-Acting siRNA Biogenesis in Plants , 2005, Cell.

[33]  Adam A. Margolin,et al.  Reverse engineering cellular networks , 2006, Nature Protocols.

[34]  Baohong Zhang,et al.  Conservation and divergence of plant microRNA genes. , 2006, The Plant journal : for cell and molecular biology.

[35]  Rabia Bashir,et al.  The Soybean Genome Database (SoyGD): a browser for display of duplicated, polyploid, regions and sequence tagged sites on the integrated physical and genetic maps of Glycine max , 2005, Nucleic Acids Res..

[36]  Qingqiu Gong,et al.  An Arabidopsis gene network based on the graphical Gaussian model. , 2007, Genome research.

[37]  J. Ohlrogge,et al.  Identification of acyltransferases required for cutin biosynthesis and production of cutin with suberin-like monomers , 2007, Proceedings of the National Academy of Sciences.

[38]  S. Baud,et al.  WRINKLED1 specifies the regulatory action of LEAFY COTYLEDON2 towards fatty acid metabolism during seed maturation in Arabidopsis. , 2007, The Plant journal : for cell and molecular biology.

[39]  Hironaka Tsukagoshi,et al.  Two B3 domain transcriptional repressors prevent sugar-inducible expression of seed maturation genes in Arabidopsis seedlings , 2007, Proceedings of the National Academy of Sciences.

[40]  Kengo Kinoshita,et al.  ATTED-II: a database of co-expressed genes and cis elements for identifying co-regulated gene groups in Arabidopsis , 2006, Nucleic Acids Res..

[41]  S. Chen,et al.  The soybean Dof-type transcription factor genes, GmDof4 and GmDof11, enhance lipid content in the seeds of transgenic Arabidopsis plants. , 2007, The Plant journal : for cell and molecular biology.

[42]  J. Ohlrogge,et al.  The Acyltransferase GPAT5 Is Required for the Synthesis of Suberin in Seed Coat and Root of Arabidopsis[W][OA] , 2007, The Plant Cell Online.

[43]  K. Gruys,et al.  Expression of Umbelopsis ramanniana DGAT2A in Seed Increases Oil in Soybean1[OA] , 2008, Plant Physiology.

[44]  François Parcy,et al.  Deciphering gene regulatory networks that control seed development and maturation in Arabidopsis. , 2008, The Plant journal : for cell and molecular biology.

[45]  Kazuo Shinozaki,et al.  Sequencing and Analysis of Approximately 40 000 Soybean cDNA Clones from a Full-Length-Enriched cDNA Library , 2008, DNA research : an international journal for rapid publication of reports on genes and genomes.

[46]  M. Strömvik,et al.  SoyXpress: A database for exploring the soybean transcriptome , 2008, BMC Genomics.

[47]  Baohong Zhang,et al.  Identification of soybean microRNAs and their targets , 2008, Planta.

[48]  R. Sunkar,et al.  Novel and nodulation-regulated microRNAs in soybean roots , 2008, BMC Genomics.

[49]  Jian-Kang Zhu,et al.  Reconstituting plant miRNA biogenesis , 2008, Proceedings of the National Academy of Sciences.

[50]  Shirong Zhang,et al.  A phenylalanine in DGAT is a key determinant of oil content and composition in maize , 2008, Nature Genetics.

[51]  M. Robles,et al.  University of Birmingham High throughput functional annotation and data mining with the Blast2GO suite , 2022 .

[52]  Masaharu Suzuki,et al.  Functional symmetry of the B3 network controlling seed development. , 2008, Current opinion in plant biology.

[53]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[54]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[55]  J. Ohlrogge,et al.  Nanoridges that characterize the surface morphology of flowers require the synthesis of cutin polyester , 2009, Proceedings of the National Academy of Sciences.

[56]  M. Hudson,et al.  Endogenous, Tissue-Specific Short Interfering RNAs Silence the Chalcone Synthase Gene Family in Glycine max Seed Coats[W][OA] , 2009, The Plant Cell Online.

[57]  O. Voinnet Origin, Biogenesis, and Activity of Plant MicroRNAs , 2009, Cell.

[58]  G. Stacey,et al.  Complete Transcriptome of the Soybean Root Hair Cell, a Single-Cell Model, and Its Alteration in Response to Bradyrhizobium japonicum Infection1[C][W][OA] , 2009, Plant Physiology.

[59]  Jianlin Cheng,et al.  SoyDB: a knowledge database of soybean transcription factors , 2010, BMC Plant Biology.

[60]  Kengo Kinoshita,et al.  ATTED-II provides coexpressed gene networks for Arabidopsis , 2008, Nucleic Acids Res..

[61]  Yanwei Wang,et al.  Identification and expression analysis of miRNAs from nitrogen-fixing soybean nodules. , 2009, Biochemical and biophysical research communications.

[62]  Jianxin Ma,et al.  SoyTEdb: a comprehensive database of transposable elements in the soybean genome , 2010, BMC Genomics.

[63]  K. Kinoshita,et al.  Rank of Correlation Coefficient as a Comparable Measure for Biological Significance of Gene Coexpression , 2009, DNA research : an international journal for rapid publication of reports on genes and genomes.

[64]  Edgar B Cahoon,et al.  Soybean Oil: Genetic Approaches for Modification of Functionality and Total Content1 , 2009, Plant Physiology.

[65]  Yun Zheng,et al.  Cloning and characterization of small RNAs from Medicago truncatula reveals four novel legume-specific microRNA families. , 2009, The New phytologist.

[66]  J. Ohlrogge,et al.  Analysis of Acyl Fluxes through Multiple Pathways of Triacylglycerol Synthesis in Developing Soybean Embryos1[W][OA] , 2009, Plant Physiology.

[67]  Daniel A. Chamovitz,et al.  Large-scale analysis of Arabidopsis transcription reveals a basal co-regulation network , 2009, BMC Systems Biology.

[68]  Tao Wang,et al.  PMRD: plant microRNA database , 2009, Nucleic Acids Res..

[69]  Runzhi Li,et al.  Vernonia DGATs increase accumulation of epoxy fatty acids in oil. , 2010, Plant biotechnology journal.

[70]  Steven B. Cannon,et al.  SoyBase, the USDA-ARS soybean genetics and genomics database , 2009, Nucleic Acids Res..

[71]  Gary D. Bader,et al.  Cytoscape Web: an interactive web-based network browser , 2010, Bioinform..

[72]  Trupti Joshi,et al.  An integrated transcriptome atlas of the crop model Glycine max, and its use in comparative analyses in plants. , 2010, The Plant journal : for cell and molecular biology.

[73]  Trupti Joshi,et al.  Prediction of novel miRNAs and associated target genes in Glycine max , 2010, BMC Bioinformatics.

[74]  Bo Wang,et al.  Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection , 2010, Nature Genetics.

[75]  Zhou Du,et al.  agriGO: a GO analysis toolkit for the agricultural community , 2010, Nucleic Acids Res..

[76]  Runzhi Li,et al.  DGAT1, DGAT2 and PDAT Expression in Seeds and Other Tissues of Epoxy and Hydroxy Fatty Acid Accumulating Plants , 2010, Lipids.

[77]  T. Sakurai,et al.  Genome sequence of the palaeopolyploid soybean , 2010, Nature.

[78]  J. Shannon,et al.  Mutant alleles of FAD2-1A and FAD2-1B combine to produce soybeans with the high oleic acid seed oil trait , 2010, BMC Plant Biology.

[79]  Monica A. Schmidt,et al.  Silencing of Soybean Seed Storage Proteins Results in a Rebalanced Protein Composition Preserving Seed Protein Content without Major Collateral Changes in the Metabolome and Transcriptome[W][OA] , 2011, Plant Physiology.

[80]  Jian Wang,et al.  Addendum: Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection , 2011, Nature Genetics.

[81]  Patrick Xuechun Zhao,et al.  psRNATarget: a plant small RNA target analysis server , 2011, Nucleic Acids Res..

[82]  B. Usadel,et al.  PlaNet: Combined Sequence and Expression Comparisons across Plant Networks Derived from Seven Species[W][OA] , 2011, Plant Cell.

[83]  Kengo Kinoshita,et al.  ATTED-II Updates: Condition-Specific Gene Coexpression to Extend Coexpression Analyses and Applications to a Broad Range of Flowering Plants , 2011, Plant & cell physiology.

[84]  D. Swarbreck,et al.  Tissue-Specific Whole Transcriptome Sequencing in Castor, Directed at Understanding Triacylglycerol Lipid Biosynthetic Pathways , 2012, PloS one.

[85]  Runzhi Li,et al.  Soybean oil biosynthesis: role of diacylglycerol acyltransferases , 2013, Functional & Integrative Genomics.

[86]  Tao Zhang,et al.  Genome-Wide Identification of Regulatory DNA Elements and Protein-Binding Footprints Using Signatures of Open Chromatin in Arabidopsis[C][W][OA] , 2012, Plant Cell.

[87]  Jun Li,et al.  LegumeIP: an integrative database for comparative genomics and transcriptomics of model legumes , 2011, Nucleic Acids Res..

[88]  E. Cober,et al.  Genetic control of soybean seed oil: II. QTL and genes that increase oil concentration without decreasing protein or with increased seed yield , 2013, Theoretical and Applied Genetics.

[89]  P. Cregan,et al.  A genome-wide association study of seed protein and oil content in soybean , 2014, BMC Genomics.

[90]  J. Ohlrogge,et al.  Acyl-Lipid Metabolism , 2013, The arabidopsis book.

[91]  Mona Tavakolan,et al.  SoyProDB: A database for the identification of soybean seed proteins , 2013, Bioinformation.

[92]  S. Chen,et al.  Soybean GmbZIP123 gene enhances lipid content in the seeds of transgenic Arabidopsis plants , 2013, Journal of experimental botany.

[93]  Yang Liu,et al.  Soybean knowledge base (SoyKB): a web resource for integration of soybean translational genomics and molecular breeding , 2013, Nucleic Acids Res..

[94]  C. H. Oleosins and Oil Bodies in Seeds and Other Organs ' , 2022 .