Universal Chemical Markup (UCM) - A new format for common chemical data

Background We wish to introduce a new chemical format called UCM (Universal Chemical Markup). The format is based on XML (Extensible Markup Language) and its first version focuses on recording chemical structures and their properties. Results UCM currently supports structures containing isotopes, ions and various types of bonding including delocalized bonds. Properties can be expressed by combining UCM with UnitsML (Units Markup Language). Using UnitsML one defines quantities with scientific units, and then refers to them in UCM when recording property values. Users can also add literature references with BibTeXML (BibTeX Markup Language) and annotate the recorded data using plain text or XHTML (Extensible Hypertext Markup Language) descriptions. In contrast to presently available general-purpose chemical formats, UCM offers built-in validation, which combines both grammar and pattern-based XML schema languages. Thus, all recorded data can be precisely validated by UCM schemas in standard XML validators. Conclusions We developed the structure for UCM from scratch on the basis of an analysis described in our previous article. Starting from scratch allowed us to integrate BibTeXML, UnitsML and XHTML as well as chemical line notations and identifiers into UCM. It also helped us to avoid unnecessary redundant parts and create the implementation that aims to minimize ambiguity and is designed to be easily extensible in the future.

[1]  not Cwi,et al.  XHTML™ 1.0 The Extensible HyperText Markup Language , 2002 .

[2]  Peter Murray-Rust,et al.  The semantics of Chemical Markup Language (CML): dictionaries and conventions , 2011, J. Cheminformatics.

[3]  R. H. Mais,et al.  The crystal and molecular structure of Zeise's salt, KPtCl3.C2H4.H2O , 1969 .

[4]  Arthur Dalby,et al.  Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited , 1992, J. Chem. Inf. Comput. Sci..

[5]  Peter Murray-Rust,et al.  CMLLite: a design philosophy for CML , 2011, J. Cheminformatics.

[6]  Henry S. Rzepa,et al.  Chemical Markup, XML, and the World Wide Web. 4. CML Schema , 2003, J. Chem. Inf. Comput. Sci..

[7]  H. Katô The Electronic Structure of Zeise’s Salt, [PtCl3C2H4]− , 1971 .

[8]  Tjerk P. Straatsma,et al.  NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations , 2010, Comput. Phys. Commun..

[9]  Henry S. Rzepa,et al.  CML: Evolution and design , 2011, J. Cheminformatics.

[10]  Marcus D. Hanwell,et al.  Avogadro: an advanced semantic chemical editor, visualization, and analysis platform , 2012, Journal of Cheminformatics.

[11]  Peter Murray-Rust,et al.  Development of chemical markup language (CML) as a system for handling complex chemical content , 2001 .

[12]  Stephen M. Watt,et al.  Mathematical Markup Language (MathML) Version 3.0 , 2001, WWW 2001.

[13]  Henry S. Rzepa,et al.  Chemical Markup, XML and the World-Wide Web. 2. Information Objects and the CMLDOM , 2001, J. Chem. Inf. Comput. Sci..

[14]  Evangelos Miliordos,et al.  On the bonding nature of ozone (O3) and its sulfur-substituted analogues SO2, OS2, and S3: correlation between their biradical character and molecular properties. , 2014, Journal of the American Chemical Society.

[15]  Dongwon Lee,et al.  Comparative analysis of six XML schema languages , 2000, SGMD.

[16]  A. G. Osborne,et al.  Dynamic NMR studies of ring rotation in substituted ferrocenes and ruthenocenes , 1991 .

[17]  Other Contributors Are Indicated Where They Contribute Thai Open Source Software Center Ltd , 2017 .

[18]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[19]  Henry S. Rzepa,et al.  Chemical Markup, XML, and the Worldwide Web. 1. Basic Principles , 1999, J. Chem. Inf. Comput. Sci..

[20]  L. Duysens Preprints , 1966, Nature.