Unique identifiers for small molecules enable rigorous labeling of their atoms

Rigorous characterization of small organic molecules in terms of their structural and biological properties is vital to biomedical research. The three-dimensional structure of a molecule, its ‘photo ID’, is inefficient for searching and matching tasks. Instead, identifiers play a key role in accessing compound data. Unique and reproducible molecule and atom identifiers are required to ensure the correct cross-referencing of properties associated with compounds archived in databases. The best approach to this requirement is the International Chemical Identifier (InChI). However, the current implementation of InChI fails to provide a complete standard for atom nomenclature, and incorrect use of the InChI standard has resulted in the proliferation of non-unique identifiers. We propose a methodology and associated software tools, named ALATIS, that overcomes these shortcomings. ALATIS is an adaptation of InChI, which operates fully within the InChI convention to provide unique and reproducible molecule and all atom identifiers. ALATIS includes an InChI extension for unique atom labeling of symmetric molecules. ALATIS forms the basis for improving reproducibility and unifying cross-referencing across databases.

[1]  Noel M. O'Boyle Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI , 2012, Journal of Cheminformatics.

[2]  Peter Vermathen,et al.  1H HR-MAS NMR Based Metabolic Profiling of Cells in Response to Treatment with a Hexacationic Ruthenium Metallaprism as Potential Anticancer Drug , 2015, PloS one.

[3]  Atta-ur-Rahman Spin—Spin Coupling in 1-NMR Spectroscopy , 1986 .

[4]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[5]  Miron Livny,et al.  NMRbox: A Resource for Biomolecular NMR Computation. , 2017, Biophysical journal.

[6]  Andrew R Leach,et al.  Fragment screening: an introduction. , 2006, Molecular bioSystems.

[7]  Christoph Steinbeck,et al.  The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013 , 2012, Nucleic Acids Res..

[8]  S. Rees,et al.  Principles of early drug discovery , 2011, British journal of pharmacology.

[9]  P. Workman,et al.  Discovery of small molecule cancer drugs: Successes, challenges and opportunities , 2012, Molecular oncology.

[10]  A. Voet,et al.  Fragment based drug design: from experimental to computational approaches. , 2012, Current medicinal chemistry.

[11]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[12]  Diane Joseph-McCarthy,et al.  Fragment-Based Lead Discovery and Design , 2014, J. Chem. Inf. Model..

[13]  Akane Kawamura,et al.  Reporter ligand NMR screening method for 2-oxoglutarate oxygenase inhibitors. , 2013, Journal of medicinal chemistry.

[14]  J. Haselden,et al.  Metabolic Profiling as a Tool for Understanding Mechanisms of Toxicity , 2008, Toxicologic pathology.

[15]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[16]  Cynthia K Larive,et al.  NMR spectroscopy for metabolomics and metabolic profiling. , 2015, Analytical chemistry.

[17]  Andrej Sali,et al.  Virtual ligand screening against comparative protein structure models. , 2012, Methods in molecular biology.

[18]  M. Hirai,et al.  MassBank: a public repository for sharing mass spectral data for life sciences. , 2010, Journal of mass spectrometry : JMS.

[19]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[20]  Axel Drefahl,et al.  CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures , 2011, J. Cheminformatics.

[21]  E. Lionta,et al.  Structure-Based Virtual Screening for Drug Discovery: Principles, Applications and Recent Advances , 2014, Current topics in medicinal chemistry.

[22]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[23]  M. Congreve,et al.  Fragment-based lead discovery , 2004, Nature Reviews Drug Discovery.

[24]  Stephen R. Heller,et al.  InChI, the IUPAC International Chemical Identifier , 2015, Journal of Cheminformatics.

[25]  Saulius Gražulis,et al.  Crystallography Open Database – an open-access collection of crystal structures , 2009, Journal of applied crystallography.

[26]  T. Ebbels,et al.  Metabolic profiling, metabolomic and metabonomic procedures for NMR spectroscopy of urine, plasma, serum and tissue extracts , 2007, Nature Protocols.

[27]  Jun Hyoung Lee,et al.  Phenotypic engineering by reprogramming gene transcription using novel artificial transcription factors in Escherichia coli , 2008, Nucleic acids research.

[28]  Daniel Raftery,et al.  Metabolic profiling of gender: Headspace-SPME/GC–MS and 1H NMR analysis of urine , 2011, Metabolomics.

[29]  Ying Zhang,et al.  HMDB: the Human Metabolome Database , 2007, Nucleic Acids Res..

[30]  Kristian Fog Nielsen,et al.  Fungal metabolite screening: database of 474 mycotoxins and fungal metabolites for dereplication by standardised liquid chromatography-UV-mass spectrometry methodology. , 2003, Journal of chromatography. A.

[31]  Jan A. Kors,et al.  Consistency of systematic chemical identifiers within and between small-molecule databases , 2012, Journal of Cheminformatics.

[32]  Takeo Tomita,et al.  Screening of secondary metabolites biosynthesized with novel amino acid carrier protein system (970.3) , 2014 .

[33]  Douglas R. Houston,et al.  Structure- and Ligand-Based Virtual Screening Identifies New Scaffolds for Inhibitors of the Oncoprotein MDM2 , 2015, PloS one.

[34]  David S. Wishart,et al.  HMDB: a knowledgebase for the human metabolome , 2008, Nucleic Acids Res..

[35]  John L. Markley,et al.  NMRmix: A Tool for the Optimization of Compound Mixtures in 1D 1H NMR Ligand Affinity Screens , 2016, Journal of proteome research.

[36]  Edward O. Cannon New Benchmark for Chemical Nomenclature Software , 2012, J. Chem. Inf. Model..

[37]  N. Blomberg,et al.  An integrated approach to fragment-based lead generation: philosophy, strategy and case studies from AstraZeneca's drug discovery programmes. , 2007, Current topics in medicinal chemistry.

[38]  W. H. Powell,et al.  Nomenclature of organic chemistry : IUPAC recommendations and preferred names 2013 , 2014 .

[39]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[40]  Michael K Gilson,et al.  Digital chemistry in the Journal of Medicinal Chemistry. , 2014, Journal of medicinal chemistry.

[41]  David S. Wishart,et al.  HMDB 3.0—The Human Metabolome Database in 2013 , 2012, Nucleic Acids Res..

[42]  Haruki Nakamura,et al.  The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data , 2006, Nucleic Acids Res..

[43]  Wei Zhou,et al.  Screening and isolation of antinematodal metabolites againstBursaphelenchus xylophilus produced by fungi , 2008, Annals of Microbiology.

[44]  J. Kullberg,et al.  NMR-based metabolic profiling in healthy individuals overfed different types of fat: links to changes in liver fat accumulation and lean tissue mass , 2015, Nutrition & Diabetes.

[45]  Charles S Henry,et al.  Review: Microfluidic applications in metabolomics and metabolic profiling. , 2009, Analytica chimica acta.

[46]  David L. Woodruff,et al.  Automated screening for metabolites in complex mixtures using 2D COSY NMR spectroscopy , 2006, Metabolomics.

[47]  James F Rusling,et al.  Screening reactive metabolites bioactivated by multiple enzyme pathways using a multiplexed microfluidic system. , 2013, The Analyst.

[48]  Zhi-hua Chen,et al.  Kyoto Encyclopedia of Genes and Genomes were used for functional enrichment analysis of differentially expressed genes (DEGs). A protein‐protein interaction network was constructed, and the hub genes were subjected to module analysis and identification using Search Tool for the Retrieval , 2019 .

[49]  Ruth Huey,et al.  Computational protein–ligand docking and virtual drug screening with the AutoDock suite , 2016, Nature Protocols.

[50]  Choong Hwan Lee,et al.  LC-MS/MS profiling-based secondary metabolite screening of Myxococcus xanthus. , 2009, Journal of microbiology and biotechnology.

[51]  Christoph Steinbeck,et al.  MetaboLights—an open-access general-purpose repository for metabolomics studies and associated meta-data , 2012, Nucleic Acids Res..

[52]  Hwanho Choi,et al.  Development and application of ligand-based NMR screening assays for γ-butyrobetaine hydroxylase†‡ , 2016 .

[53]  Roger A. Sayle,et al.  Get Your Atoms in Order - An Open-Source Implementation of a Novel and Robust Molecular Canonicalization Algorithm , 2015, J. Chem. Inf. Model..

[54]  Henry S. Rzepa,et al.  Communication and re-use of chemical information in bioscience , 2005, BMC Bioinformatics.