A rule-based algorithm for automatic bond type perception

Assigning bond orders is a necessary and essential step for characterizing a chemical structure correctly in force field based simulations. Several methods have been developed to do this. They all have advantages but with limitations too. Here, an automatic algorithm for assigning chemical connectivity and bond order regardless of hydrogen for organic molecules is provided, and only three dimensional coordinates and element identities are needed for our algorithm. The algorithm uses hard rules, length rules and conjugation rules to fix the structures. The hard rules determine bond orders based on the basic chemical rules; the length rules determine bond order by the length between two atoms based on a set of predefined values for different bond types; the conjugation rules determine bond orders by using the length information derived from the previous rule, the bond angles and some small structural patterns. The algorithm is extensively evaluated in three datasets, and achieves good accuracy of predictions for all the datasets. Finally, the limitation and future improvement of the algorithm are discussed.

[1]  Haruki Nakamura,et al.  The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data , 2006, Nucleic Acids Res..

[2]  P. Kollman,et al.  Automatic atom type and bond type perception in molecular mechanical calculations. , 2006, Journal of molecular graphics & modelling.

[3]  Robin Taylor,et al.  Deducing chemical structure from crystallographically determined atomic coordinates , 2011, Acta crystallographica. Section B, Structural science.

[4]  Friedrich Rippmann,et al.  BALI: Automatic Assignment of Bond and Atom Types for Protein Ligands in the Brookhaven Protein Databank , 1997, J. Chem. Inf. Comput. Sci..

[5]  Hans-Peter Lenhof,et al.  Automated Bond Order Assignment as an Optimization Problem , 2011, GCB.

[6]  C. Chennubhotla,et al.  Insights into equilibrium dynamics of proteins from comparison of NMR and X-ray data with computational predictions. , 2007, Structure.

[7]  Elaine C. Meng,et al.  Determination of molecular topology and atomic hybridization states from heavy atom coordinates , 1991 .

[8]  Edward E. Hodgkin,et al.  Automatic assignment of chemical connectivity to organic molecules in the Cambridge Structural Database , 1992, J. Chem. Inf. Comput. Sci..

[9]  Tingjun Hou,et al.  ADME evaluation in drug discovery , 2002, Journal of molecular modeling.

[10]  Yuan Zhao,et al.  Automatic Perception of Organic Molecules Based on Essential Structural Information , 2007, J. Chem. Inf. Model..

[11]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[12]  Paul Labute,et al.  On the Perception of Molecules from 3D Atomic Coordinates , 2005, J. Chem. Inf. Model..

[13]  John Bradshaw,et al.  Comparison of Conformations of Small Molecule Structures from the Protein Data Bank with Those Generated by Concord, Cobra, ChemDBS‐3D, and Converter and Those Extracted from the Cambridge Structural Database. , 1994 .

[14]  Matheus Froeyen,et al.  Correct Bond Order Assignment in a Molecular Framework Using Integer Linear Programming with Application to Molecules Where Only Non-Hydrogen Atom Coordinates Are Available , 2005, J. Chem. Inf. Model..

[15]  Anthony L. Spek,et al.  Structure validation in chemical crystallography , 2009, Acta crystallographica. Section D, Biological crystallography.

[16]  F. Allen The Cambridge Structural Database: a quarter of a million crystal structures and rising. , 2002, Acta crystallographica. Section B, Structural science.

[17]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[18]  Tingjun Hou,et al.  ADME Evaluation in Drug Discovery. 4. Prediction of Aqueous Solubility Based on Atom Contribution Approach , 2004, J. Chem. Inf. Model..