BpForms and BcForms: Tools for concretely describing non-canonical polymers and complexes to facilitate comprehensive biochemical networks

Although non-canonical residues, caps, crosslinks, and nicks play an important role in the function of many DNA, RNA, proteins, and complexes, we do not fully understand how networks of non-canonical macromolecules generate behavior. One barrier is our limited formats, such as IUPAC, for abstractly describing macromolecules. To overcome this barrier, we developed BpForms and BcForms, a toolkit of ontologies, grammars, and software for abstracting the primary structure of polymers and complexes as combinations of residues, caps, crosslinks, and nicks. The toolkit can help quality control, exchange, and integrate information about the primary structure of macromolecules into fine-grained global networks of intracellular biochemistry.

[1]  Suzanne M. Paley,et al.  The BioCyc collection of microbial genomes and metabolic pathways , 2019, Briefings Bioinform..

[2]  X. Xia,et al.  An improved estimation of tRNA expression to better elucidate the coevolution between tRNA abundance and codon usage in bacteria , 2019, Scientific Reports.

[3]  Bin Zhang,et al.  15 years of PhosphoSitePlus®: integrating post-translationally modified sites, disease variants and isoforms , 2018, Nucleic Acids Res..

[4]  Henning Hermjakob,et al.  Complex Portal 2018: extended content and enhanced visualization tools for macromolecular complexes , 2018, Nucleic Acids Res..

[5]  Andreas Ruepp,et al.  CORUM: the comprehensive resource of mammalian protein complexes—2019 , 2018, Nucleic Acids Res..

[6]  A. Teleman,et al.  SETD3 protein is the actin-specific histidine N-methyltransferase , 2018, bioRxiv.

[7]  Chris J. Myers,et al.  The Systems Biology Markup Language (SBML): Language Specification for Level 3 Version 2 Core , 2018, J. Integr. Bioinform..

[8]  Zhen Zhang,et al.  Synthetic Biology Open Language (SBOL) Version 2.2.0 , 2018, J. Integr. Bioinform..

[9]  Michael R Shortreed,et al.  ProForma: A Standard Proteoform Notation. , 2018, Journal of proteome research.

[10]  Tomasz K. Wirecki,et al.  MODOMICS: a database of RNA modification pathways. 2017 update , 2017, Nucleic Acids Res..

[11]  Shun Liu,et al.  RMBase v2.0: deciphering the map of RNA modifications from epitranscriptome sequencing data , 2017, Nucleic Acids Res..

[12]  Jonathan R. Karr,et al.  Emerging whole-cell modeling principles and methods. , 2017, Current opinion in biotechnology.

[13]  Coby Viner,et al.  DNAmod: the DNA modification database , 2016, bioRxiv.

[14]  Adam M. Feist,et al.  iML1515, a knowledgebase that computes Escherichia coli traits , 2017, Nature Biotechnology.

[15]  jf Foundation | The most advanced responsive front-end framework in the world. , 2017 .

[16]  Other Contributors Are Indicated Where They Contribute Python Software Foundation , 2017 .

[17]  Jian Zhang,et al.  Protein Ontology (PRO): enhancing and scaling up the representation of protein entities , 2016, Nucleic Acids Res..

[18]  Zhi Xie,et al.  MethSMRT: an integrative database for DNA N6-methyladenine and N4-methylcytosine generated by single-molecular real-time sequencing , 2016, Nucleic Acids Res..

[19]  Juliane Fluck,et al.  Training and evaluation corpora for the extraction of causal relationships encoded in biological expression language (BEL) , 2016, Database J. Biol. Databases Curation.

[20]  N. Kelleher,et al.  Progress in Top-Down Proteomics and the Analysis of Proteoforms. , 2016, Annual review of analytical chemistry.

[21]  Vincent Danos,et al.  Annotation of rule-based models with formal semantics to enable creation, analysis, reuse and visualization , 2015, Bioinform..

[22]  Dipak Barua,et al.  BioNetGen 2.2: advances in rule-based modeling , 2015, Bioinform..

[23]  Plácido Luna Fancybox - Fancy jQuery lightbox alternative , 2016 .

[24]  Nicolas Le Novère,et al.  COMBINE Archive Specification Version 1 , 2015, J. Integr. Bioinform..

[25]  Peter J. Hunter,et al.  The CellML 1.1 Specification , 2015, J. Integr. Bioinform..

[26]  Stephen R. Heller,et al.  InChI, the IUPAC International Chemical Identifier , 2015, Journal of Cheminformatics.

[27]  Nongluk Plongthongkum,et al.  Advances in the profiling of DNA modifications: cytosine methylation and beyond , 2014, Nature Reviews Genetics.

[28]  David L. Kaplan,et al.  Engineered recombinant bacterial collagen as an alternative collagen-based biomaterial for tissue engineering , 2014, Front. Chem..

[29]  Qin Ye,et al.  Biosynthesis of trans-4-hydroxyproline by recombinant strains of Corynebacterium glutamicum and Escherichia coli , 2014, BMC Biotechnology.

[30]  Francis B. Peters,et al.  Genetic Incorporation of Histidine Derivatives Using an Engineered Pyrrolysyl-tRNA Synthetase , 2014, ACS chemical biology.

[31]  David S. Goodsell,et al.  The RCSB Protein Data Bank: new resources for research and education , 2012, Nucleic Acids Res..

[32]  J. Bujnicki,et al.  MODOMICS: a database of RNA modification pathways—2013 update , 2012, Nucleic Acids Res..

[33]  Jonathan R. Karr,et al.  A Whole-Cell Computational Model Predicts Phenotype from Genotype , 2012, Cell.

[34]  Michel Dumontier,et al.  Controlled vocabularies and semantics in systems biology , 2011, Molecular systems biology.

[35]  Igor V. Filippov,et al.  Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on , 2011, J. Cheminformatics.

[36]  R. Raines,et al.  Tunable, post-translational hydroxylation of collagen Domains in Escherichia coli. , 2011, ACS chemical biology.

[37]  Jef Rozenski,et al.  The RNA modification database, RNAMDB: 2011 update , 2010, Nucleic Acids Res..

[38]  Kristian Rother,et al.  REPAIRtoire—a database of DNA repair pathways , 2010, Nucleic Acids Res..

[39]  Gary D Bader,et al.  BioPAX – A community standard for pathway data sharing , 2010, Nature Biotechnology.

[40]  Hans-Werner Mewes,et al.  CORUM: the comprehensive resource of mammalian protein complexes , 2007, Nucleic Acids Res..

[41]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[42]  J. Haugh,et al.  PI3K-dependent cross-talk interactions converge with Ras as quantifiable inputs integrated by Erk , 2009, Molecular systems biology.

[43]  Luisa Montecchi-Palazzi,et al.  The PSI-MOD community standard for representation of protein modification data , 2008, Nature Biotechnology.

[44]  A single tRNA base pair mediates bacterial tRNA-dependent biosynthesis of asparagine , 2006, Nucleic acids research.

[45]  John S Garavelli,et al.  The RESID Database of Protein Modifications as a resource and annotation tool , 2004, Proteomics.

[46]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[47]  Shonda A. Leonard IUPAC/IUB Single‐Letter Codes Within Nucleic Acid and Amino Acid Sequences , 2003 .

[48]  B. Kholodenko,et al.  Negative feedback and ultrasensitivity can bring about oscillations in the mitogen-activated protein kinase cascades. , 2000, European journal of biochemistry.

[49]  C. Peng,et al.  SCALABLE VECTOR GRAPHICS (SVG) , 2000 .

[50]  C. Kurland,et al.  Co-variation of tRNA abundance and codon usage in Escherichia coli at different growth rates. , 1996, Journal of molecular biology.

[51]  J. Tyson Modeling the cell division cycle: cdc2 and cyclin interactions. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[52]  W. Pearson Rapid and sensitive sequence comparison with FASTP and FASTA. , 1990, Methods in enzymology.

[53]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[54]  A. Glazer,et al.  gamma-N-methylasparagine in phycobiliproteins. Occurrence, location, and biosynthesis. , 1987, The Journal of biological chemistry.

[55]  C. Woldringh,et al.  Morphological analysis of the division cycle of two Escherichia coli substrains during slow growth , 1977, Journal of bacteriology.