High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource

Significance Genes must be annotated with their correct functions if genome data are to support hypothesis building and metabolic engineering. PlantSEED was developed to streamline the process of annotating plant genome sequences, to construct metabolic models based on genome annotations automatically, and to use models to test the annotation of these sequences, allowing the detection of gaps and errors in gene annotations and the prediction of new functions. PlantSEED is designed to grow in an iterative manner by including new plant genome sequences, new annotations harvested from the literature, and improved biochemical data, all of which are integrated in a consistent manner into the PlantSEED genomes and metabolic models. The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today’s annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for cross-kingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of model-based assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed.

[1]  Milton H. Saier,et al.  The Transporter Classification Database , 2013, Nucleic Acids Res..

[2]  Dean Ravenscroft,et al.  A genome scale metabolic network for rice and accompanying analysis of tryptophan, auxin and serotonin biosynthesis regulation under biotic stress , 2013, Rice.

[3]  D. Fell,et al.  Responses to Light Intensity in a Genome-Scale Model of Rice Metabolism1[C][W][OA] , 2013, Plant Physiology.

[4]  Ethalinda K. S. Cannon,et al.  Maize Metabolic Network Construction and Transcriptome Analysis , 2013 .

[5]  En-Hua Xia,et al.  Prevalent Role of Gene Features in Determining Evolutionary Fates of Whole-Genome Duplication Duplicated Genes in Flowering Plants1[W][OA] , 2013, Plant Physiology.

[6]  R. T. Brumfield,et al.  Applications of next-generation sequencing to phylogeography and phylogenetics. , 2013, Molecular phylogenetics and evolution.

[7]  Frank Stahl,et al.  Transcriptome analysis using next-generation sequencing. , 2013, Current opinion in biotechnology.

[8]  Stephen R. Heller,et al.  InChI - the worldwide chemical structure identifier standard , 2013, Journal of Cheminformatics.

[9]  Fangfang Xia,et al.  Building the repertoire of dispensable chromosome regions in Bacillus subtilis entails major refinement of cognate large-scale metabolic model , 2012, Nucleic acids research.

[10]  R. Overbeek,et al.  Automated genome annotation and metabolic model reconstruction in the SEED and Model SEED. , 2013, Methods in molecular biology.

[11]  Fangfang Xia,et al.  SEED Servers: High-Performance Access to the SEED Genomes, Annotations, and Metabolic Models , 2012, PloS one.

[12]  Svetlana Gerdes,et al.  Plant B vitamin pathways and their compartmentation: a guide for the perplexed. , 2012, Journal of experimental botany.

[13]  Andrew D Hanson,et al.  Frontiers in metabolic reconstruction and modeling of plant genomes. , 2012, Journal of experimental botany.

[14]  Peter D. Karp,et al.  Construction and completion of flux balance models from pathway databases , 2012, Bioinform..

[15]  Tanya Z. Berardini,et al.  The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools , 2011, Nucleic Acids Res..

[16]  I-Min A. Chen,et al.  The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata , 2011, Nucleic Acids Res..

[17]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[18]  J. Batley,et al.  Accessing complex crop genomes with next-generation sequencing , 2012, Theoretical and Applied Genetics.

[19]  Jie Li,et al.  MetNet Online: a novel integrated resource for plant systems biology , 2012, BMC Bioinformatics.

[20]  Y. van de Peer,et al.  Dissecting Plant Genomes with the PLAZA Comparative Genomics Platform1[W] , 2011, Plant Physiology.

[21]  Lake-Ee Quek,et al.  AlgaGEM – a genome-scale metabolic reconstruction of algae based on the Chlamydomonas reinhardtii genome , 2011, BMC Genomics.

[22]  E. Ruppin,et al.  Reconstruction of Arabidopsis metabolic network models accounting for subcellular compartmentalization and tissue-specificity , 2011, Proceedings of the National Academy of Sciences.

[23]  Shu Wei,et al.  Interconversions of different forms of vitamin B6 in tobacco plants. , 2011, Phytochemistry.

[24]  Rick L. Stevens,et al.  Connecting genotype to phenotype in the era of high-throughput sequencing. , 2011, Biochimica et biophysica acta.

[25]  Jason A. Papin,et al.  Metabolic network reconstruction of Chlamydomonas offers insight into light-driven algal metabolism , 2011, Molecular systems biology.

[26]  E. Aro,et al.  Cyanobacterial NDH-1 complexes: novel insights and remaining puzzles. , 2011, Biochimica et biophysica acta.

[27]  C. Maranas,et al.  Zea mays iRS1563: A Comprehensive Genome-Scale Metabolic Reconstruction of Maize Metabolism , 2011, PloS one.

[28]  James C. Schnable,et al.  Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss , 2011, Proceedings of the National Academy of Sciences.

[29]  Edward S. Buckler,et al.  Gramene database in 2010: updates and extensions , 2010, Nucleic Acids Res..

[30]  Ronan M. T. Fleming,et al.  Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0 , 2007, Nature Protocols.

[31]  Pankaj Jaiswal,et al.  Gramene database: a hub for comparative plant genomics. , 2011, Methods in molecular biology.

[32]  M. Tang,et al.  Influence of arbuscular mycorrhiza on organic solutes in maize leaves under salt stress , 2011, Mycorrhiza.

[33]  L. Quek,et al.  C4GEM, a Genome-Scale Metabolic Model to Study C4 Plant Metabolism1[W][OA] , 2010, Plant Physiology.

[34]  Tariq A. Akhtar,et al.  Functional analysis of folate polyglutamylation and its essential role in plant metabolism and development. , 2010, The Plant journal : for cell and molecular biology.

[35]  Rick L. Stevens,et al.  High-throughput generation, optimization and analysis of genome-scale metabolic models , 2010, Nature Biotechnology.

[36]  F. Navari-Izzo,et al.  Lipoic acid and redox status in barley plants subjected to salinity and elevated CO2. , 2010, Physiologia plantarum.

[37]  P. Karp,et al.  Creation of a Genome-Wide Metabolic Pathway Database for Populus trichocarpa Using a New Approach for Reconstruction and Curation of Metabolic Pathways for Plants1[W][OA] , 2010, Plant Physiology.

[38]  Jim K. Fredrickson,et al.  Constraint-Based Model of Shewanella oneidensis MR-1 Metabolism: A Tool for Data Analysis and Hypothesis Generation , 2010, PLoS Comput. Biol..

[39]  Eytan Ruppin,et al.  Integrating quantitative proteomics and metabolomics with a genome-scale metabolic network model , 2010, Bioinform..

[40]  I. Sønderby,et al.  Biosynthesis of glucosinolates--gene discovery and beyond. , 2010, Trends in plant science.

[41]  I-Min A. Chen,et al.  The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata , 2007, Nucleic Acids Res..

[42]  L. Quek,et al.  AraGEM, a Genome-Scale Reconstruction of the Primary Metabolic Network in Arabidopsis1[W] , 2009, Plant Physiology.

[43]  Haibao Tang,et al.  Angiosperm genome comparisons reveal early polyploidy in the monocot lineage , 2009, Proceedings of the National Academy of Sciences.

[44]  D. Fell,et al.  A Genome-Scale Metabolic Model of Arabidopsis and Some of Its Properties1[C][W] , 2009, Plant Physiology.

[45]  Manal AbuOun,et al.  Genome Scale Reconstruction of a Salmonella Metabolic Model , 2009, The Journal of Biological Chemistry.

[46]  Jason A. Papin,et al.  Metabolic network analysis integrated with transcript verification for sequenced genomes , 2009, Nature Methods.

[47]  Rick L Stevens,et al.  iBsu1103: a new genome-scale metabolic model of Bacillus subtilis based on SEED annotations , 2009, Genome Biology.

[48]  Vinay Satish Kumar,et al.  GrowMatch: An Automated Method for Reconciling In Silico/In Vivo Growth Predictions , 2009, PLoS Comput. Biol..

[49]  Vinay Satish Kumar,et al.  A Genome-Scale Metabolic Reconstruction of Mycoplasma genitalium, iPS189 , 2009, PLoS Comput. Biol..

[50]  Qi Sun,et al.  PPDB, the Plant Proteomics Database at Cornell , 2008, Nucleic Acids Res..

[51]  D. Fell,et al.  A Genome-Scale Metabolic Model of Arabidopsis and Some of Its Properties , 2009 .

[52]  Albert J. Vilella,et al.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. , 2009, Genome research.

[53]  Bernhard O. Palsson,et al.  Constraint-based analysis of metabolic capacity of Salmonella typhimurium during host-pathogen interaction , 2009, BMC Systems Biology.

[54]  Bernhard O. Palsson,et al.  Connecting Extracellular Metabolomic Measurements to Intracellular Flux States in Yeast , 2022 .

[55]  Igor Goryanin,et al.  A fragile metabolic network adapted for cooperation in the symbiotic bacterium Buchnera aphidicola , 2009, BMC Systems Biology.

[56]  John A. Morgan,et al.  BMC Systems Biology BioMed Central Research article , 2009 .

[57]  Vincent Schächter,et al.  Iterative reconstruction of a global metabolic model of Acinetobacter baylyi ADP1 using high-throughput growth phenotype and gene essentiality data , 2008, BMC Systems Biology.

[58]  Bernhard O. Palsson,et al.  A genome-scale metabolic reconstruction of Pseudomonas putida KT2440: iJN746 as a cell factory , 2008, BMC Systems Biology.

[59]  Intawat Nookaew,et al.  The genome-scale metabolic model iIN800 of Saccharomyces cerevisiae and its validation: a scaffold to query lipid metabolism , 2008, BMC Syst. Biol..

[60]  Matthew D. Jankowski,et al.  Group contribution method for thermodynamic analysis of complex metabolic networks. , 2008, Biophysical journal.

[61]  Adam M. Feist,et al.  The growing scope of applications of genome-scale metabolic reconstructions using Escherichia coli , 2008, Nature Biotechnology.

[62]  A. Glenn,et al.  A single extraction method for the analysis by liquid chromatography/tandem mass spectrometry of fumonisins and biomarkers of disrupted sphingolipid metabolism in tissues of maize seedlings , 2008, Analytical and bioanalytical chemistry.

[63]  P. May,et al.  Metabolomics- and Proteomics-Assisted Genome Annotation and Analysis of the Draft Metabolic Network of Chlamydomonas reinhardtii , 2008, Genetics.

[64]  Juren Zhang,et al.  Influence of water stress on endogenous hormone contents and cell damage of maize seedlings. , 2008, Journal of integrative plant biology.

[65]  Jason A. Papin,et al.  Genome-scale Metabolic Network Analysis of the Opportunistic Pathogen Pseudomonas Aeruginosa Pao1 , 2022 .

[66]  Vincent Fromion,et al.  Reconstruction and analysis of the genetic and metabolic regulatory networks of the central metabolism of Bacillus subtilis , 2008, BMC Systems Biology.

[67]  Rick L. Stevens,et al.  The RAST Server: Rapid Annotations using Subsystems Technology , 2008, BMC Genomics.

[68]  B. Palsson,et al.  Genome-scale Reconstruction of Metabolic Network in Bacillus subtilis Based on High-throughput Phenotyping and Gene Essentiality Data* , 2007, Journal of Biological Chemistry.

[69]  Adam M. Feist,et al.  A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information , 2007, Molecular systems biology.

[70]  Bernhard O. Palsson,et al.  Investigating the metabolic capabilities of Mycobacterium tuberculosis H37Rv using the in silico strain iNJ661 and proposing alternative drug targets , 2007 .

[71]  T. Shikanai,et al.  Cyclic electron transport around photosystem I: genetic approaches. , 2007, Annual review of plant biology.

[72]  Brian Smith-White,et al.  A collection of plant-specific genomic data and resources at NCBI. , 2007, Methods in molecular biology.

[73]  Ronan M. T. Fleming,et al.  Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0 , 2007, Nature Protocols.

[74]  Vinay Satish Kumar,et al.  Optimization based automated curation of metabolic reconstructions , 2007, BMC Bioinformatics.

[75]  B. Palsson,et al.  Systems approach to refining genome annotation , 2006, Proceedings of the National Academy of Sciences.

[76]  X. Xia,et al.  Different effects of chilling on respiration in leaves and roots of cucumber (Cucumis sativus). , 2006, Plant physiology and biochemistry : PPB.

[77]  Elzbieta Romanowska,et al.  Acclimation of mesophyll and bundle sheath chloroplasts of maize to different irradiances during growth. , 2006, Biochimica et biophysica acta.

[78]  Andrew R. Joyce,et al.  Experimental and Computational Assessment of Conditionally Essential Genes in Escherichia coli , 2006, Journal of bacteriology.

[79]  R. Hatfield,et al.  Cell wall composition in juvenile and adult leaves of maize (Zea mays L.). , 2006, Journal of agricultural and food chemistry.

[80]  Matthew D. Jankowski,et al.  Genome-scale thermodynamic analysis of Escherichia coli metabolism. , 2006, Biophysical journal.

[81]  P. Dörmann,et al.  A Salvage Pathway for Phytol Metabolism in Arabidopsis* , 2006, Journal of Biological Chemistry.

[82]  Adam M. Feist,et al.  Modeling methanogenesis with a genome‐scale metabolic reconstruction of Methanosarcina barkeri , 2006 .

[83]  Bernhard O Palsson,et al.  The global transcriptional regulatory network for metabolism in Escherichia coli exhibits few dominant functional states. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[84]  R. Last,et al.  The Arabidopsis vitamin E pathway gene5-1 Mutant Reveals a Critical Role for Phytol Kinase in Seed Tocopherol Biosynthesis[W][OA] , 2005, The Plant Cell Online.

[85]  B. Birren,et al.  Structure and Architecture of the Maize Genome1[W] , 2005, Plant Physiology.

[86]  Naryttza N. Diaz,et al.  The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes , 2005, Nucleic acids research.

[87]  B. Palsson,et al.  Expanded Metabolic Reconstruction of Helicobacter pylori (iIT341 GSM/GPR): an In Silico Genome-Scale Characterization of Single- and Double-Deletion Mutants , 2005, Journal of bacteriology.

[88]  H. Valentin,et al.  Biotechnological production and application of vitamin E: current state and prospects , 2005, Applied Microbiology and Biotechnology.

[89]  B. Palsson,et al.  Genome-scale reconstruction of the metabolic network in Staphylococcus aureus N315: an initial draft to the two-dimensional annotation , 2005, BMC Microbiology.

[90]  Markus J. Herrgård,et al.  Reconstruction and validation of Saccharomyces cerevisiae iND750, a fully compartmentalized genome-scale metabolic model. , 2004, Genome research.

[91]  Peter D. Karp,et al.  A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases , 2004, BMC Bioinformatics.

[92]  J. Battey,et al.  Evolutionary and tissue-specific control of expression of multiple acyl-carrier protein isoforms in plants and bacteria , 1990, Planta.

[93]  A. Buchala,et al.  Non-endospermic hemicellulosic β-glucans from cereals , 1970, Naturwissenschaften.

[94]  J. Ohlrogge,et al.  Acetyl coenzyme A concentrations in plant tissues. , 2004, Journal of plant physiology.

[95]  M. Aluru,et al.  Control of chloroplast redox by the IMMUTANS terminal oxidase. , 2004, Physiologia plantarum.

[96]  B. Palsson,et al.  An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR) , 2003, Genome Biology.

[97]  T. Cataldi,et al.  Assessment of riboflavin and flavin content in common food samples by capillary electrophoresis with laser-induced fluorescence detection , 2003 .

[98]  S. Rhee,et al.  AraCyc: A Biochemical Pathway Database for Arabidopsis1 , 2003, Plant Physiology.

[99]  T. Young,et al.  Increasing vitamin C content of plants through enhanced ascorbate recycling , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[100]  B. Moerschbacher,et al.  Isolation and characterisation of the homogalacturonan from type II cell walls of the commelinoid monocot wheat using HF-solvolysis. , 2003, Carbohydrate research.

[101]  E. Koonin,et al.  Orthology, paralogy and proposed classification for paralog subtypes. , 2002, Trends in genetics : TIG.

[102]  Peter D. Karp,et al.  The MetaCyc Database , 2002, Nucleic Acids Res..

[103]  M. Kuntz,et al.  A plastid terminal oxidase comes to light: implications for carotenoid biosynthesis and chlororespiration. , 2001, Trends in plant science.

[104]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[105]  A. Douglas,et al.  Nutritional interactions in insect-microbial symbioses: aphids and their symbiotic bacteria Buchnera. , 1998, Annual review of entomology.

[106]  P. Stamp,et al.  Acclimation by suboptimal growth temperature diminishes photooxidative damage in maize leaves , 1997 .

[107]  W. Gu,et al.  Evolutionary recruitment of biochemically specialized subdivisions of Family I within the protein superfamily of aminotransferases , 1996, Journal of bacteriology.

[108]  R. Douce,et al.  Localization of free and bound biotin in cells from green pea leaves. , 1993, Archives of biochemistry and biophysics.

[109]  R. Hampp,et al.  Determination of compartmented metabolite pools by a combination of rapid fractionation of oat mesophyll protoplasts and enzymic cycling. , 1984, Plant physiology.

[110]  T. Sugiyama,et al.  Partitioning of Nitrogen among Ribulose-1,5-bisphosphate Carboxylase/Oxygenase, Phosphoenolpyruvate Carboxylase, and Pyruvate Orthophosphate Dikinase as Related to Biomass Productivity in Maize Seedlings. , 1984, Plant physiology.

[111]  S. Yang,et al.  Biosynthesis of stress ethylene induced by water deficit. , 1981, Plant physiology.

[112]  H. H. Laar,et al.  Products, requirements and efficiency of biosynthesis: a quantitative approach. , 1974, Journal of theoretical biology.

[113]  M. Rumsby,et al.  Plastid differentiation, acyl lipid, and Fatty Acid changes in developing green maize leaves. , 1973, Plant physiology.

[114]  A. Buchala,et al.  Uronic acid residues in the total hemicelluloses of oats , 1973 .

[115]  F. Zscheile,et al.  Sterol Changes in Maize Leaves Infected with Helminthosporium carbonum. , 1970, Plant physiology.

[116]  L. Gibbins,et al.  Vitamins in germination. Distribution of inositol during the germination of the dwarf bean, Phaseolus vulgaris. , 1963, The Biochemical journal.