Automated Bond Order Assignment as an Optimization Problem

MOTIVATION Numerous applications in Computational Biology process molecular structures and hence strongly rely not only on correct atomic coordinates but also on correct bond order information. For proteins and nucleic acids, bond orders can be easily deduced but this does not hold for other types of molecules like ligands. For ligands, bond order information is not always provided in molecular databases and thus a variety of approaches tackling this problem have been developed. In this work, we extend an ansatz proposed by Wang et al. that assigns connectivity-based penalty scores and tries to heuristically approximate its optimum. In this work, we present three efficient and exact solvers for the problem replacing the heuristic approximation scheme of the original approach: an A*, an ILP and an fixed-parameter approach (FPT) approach. RESULTS We implemented and evaluated the original implementation, our A*, ILP and FPT formulation on the MMFF94 validation suite and the KEGG Drug database. We show the benefit of computing exact solutions of the penalty minimization problem and the additional gain when computing all optimal (or even suboptimal) solutions. We close with a detailed comparison of our methods. AVAILABILITY The A* and ILP solution are integrated into the open-source C++ LGPL library BALL and the molecular visualization and modelling tool BALLView and can be downloaded from our homepage www.ball-project.org. The FPT implementation can be downloaded from http://bio.informatik.uni-jena.de/software/.

[1]  Hans-Peter Lenhof,et al.  BALL-rapid software prototyping in computational molecular biology , 2000, Bioinform..

[2]  Friedrich Rippmann,et al.  BALI: Automatic Assignment of Bond and Atom Types for Protein Ligands in the Brookhaven Protein Databank , 1997, J. Chem. Inf. Comput. Sci..

[3]  Matheus Froeyen,et al.  Correct Bond Order Assignment in a Molecular Framework Using Integer Linear Programming with Application to Molecules Where Only Non-Hydrogen Atom Coordinates Are Available , 2005, J. Chem. Inf. Model..

[4]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[5]  Bonnie Berger,et al.  A tree-decomposition approach to protein structure prediction , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[6]  P. Kollman,et al.  Automatic atom type and bond type perception in molecular mechanical calculations. , 2006, Journal of molecular graphics & modelling.

[7]  D. M. F. Aalten,et al.  PRODRG, a program for generating molecular topologies and unique molecular descriptors from coordinates of small molecules , 1996, J. Comput. Aided Mol. Des..

[8]  Sabine C. Mueller,et al.  BALL - biochemical algorithms library 1.3 , 2010, BMC Bioinformatics.

[9]  Ernst Althaus,et al.  A combinatorial approach to protein docking with flexible side-chains , 2000, RECOMB '00.

[10]  Paul Labute,et al.  On the Perception of Molecules from 3D Atomic Coordinates , 2005, J. Chem. Inf. Model..

[11]  Susumu Goto,et al.  LIGAND: database of chemical compounds and reactions in biological pathways , 2002, Nucleic Acids Res..

[12]  Christopher I. Bayly,et al.  Fast, efficient generation of high‐quality atomic charges. AM1‐BCC model: II. Parameterization and validation , 2002, J. Comput. Chem..

[13]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[14]  T. Halgren Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94 , 1996, J. Comput. Chem..

[15]  F. Allen The Cambridge Structural Database: a quarter of a million crystal structures and rising. , 2002, Acta crystallographica. Section B, Structural science.

[16]  A R Leach,et al.  Exploring the conformational space of protein side chains using dead‐end elimination and the A* algorithm , 1998, Proteins.

[17]  Robin Taylor,et al.  A new test set for validating predictions of protein–ligand interaction , 2002, Proteins.

[18]  Derek G. Corneil,et al.  Complexity of finding embeddings in a k -tree , 1987 .

[19]  Thomas A. Halgren MMFF VI. MMFF94s option for energy minimization studies , 1999, J. Comput. Chem..

[20]  Sebastian Böcker,et al.  Computing Bond Types in Molecule Graphs , 2009, COCOON.

[21]  Ernst Althaus,et al.  A Combinatorial Approach to Protein Docking with Flexible Side Chains , 2002, J. Comput. Biol..

[22]  Vibhav Gogate,et al.  A Complete Anytime Algorithm for Treewidth , 2004, UAI.

[23]  Tiejun Cheng,et al.  Automatic Perception of Organic Molecules Based on Essential Structural Information. , 2007 .

[24]  Edward E. Hodgkin,et al.  Automatic assignment of chemical connectivity to organic molecules in the Cambridge Structural Database , 1992, J. Chem. Inf. Comput. Sci..

[25]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[26]  Egon L. Willighagen,et al.  The Blue Obelisk—Interoperability in Chemical Informatics , 2006, J. Chem. Inf. Model..

[27]  Yair Weiss,et al.  Minimizing and Learning Energy Functions for Side-Chain Prediction , 2007, RECOMB.

[28]  J. Irwin,et al.  ZINC ? A Free Database of Commercially Available Compounds for Virtual Screening. , 2005 .

[29]  Hans-Peter Lenhof,et al.  BALLView: a tool for research and education in molecular modeling , 2006, Bioinform..