New approaches to protein docking

In the first part of this work, we propose new methods for protein docking. First, we present two approaches to protein docking with flexible side chains. The first approach is a fast greedy heuristic, while the second is a branch -&-cut algorithm that yields optimal solutions. For a test set of protease-inhibitor complexes, both approaches correctly predict the true complex structure. Another problem in protein docking is the prediction of the binding free energy, which is the the final step of many protein docking algorithms. Therefore, we propose a new approach that avoids the expensive and difficult calculation of the binding free energy and, instead, employs a scoring function that is based on the similarity of the proton nuclear magnetic resonance spectra of the tentative complexes with the experimental spectrum. Using this method, we could even predict the structure of a very difficult protein-peptide complex that could not be solved using any energy-based scoring functions. The second part of this work presents BALL (Biochemical ALgorithms Library), a framework for Rapid Application Development in the field of Molecular Modeling. BALL provides an extensive set of data structures as well as classes for Molecular Mechanics, advanced solvation methods, comparison and analysis of protein structures, file import/export, NMR shift prediction, and visualization. BALL has been carefully designed to be robust, easy to use, and open to extensions. Especially its extensibility, which results from an object-oriented and generic programming approach, distinguishes it from other software packages. Der erste Teil dieser Arbeit beschaftigt sich mit neuen Ansatzen zum Proteindocking. Zunachst stellen wir zwei Ansatze zum Proteindocking mit flexiblen Seitenketten vor. Der erste Ansatz beruht auf einer schnellen, gierigen Heuristik, wahrend der zweite Ansatz auf branch-&-cut-Techniken beruht und das Problem optimal losen kann. Beide Ansatze sind in der Lage die korrekte Komplexstruktur fur einen Satz von Testbeispielen (bestehend aus Protease-Inhibitor-Komplexen) vorherzusagen. Ein weiteres, grosstenteils ungelostes, Problem ist der letzte Schritt vieler Protein-Docking-Algorithmen, die Vorhersage der freien Bindungsenthalpie. Daher schlagen wir eine neue Methode vor, die die schwierige und aufwandige Berechnung der freien Bindungsenthalpie vermeidet. Statt dessen wird eine Bewertungsfunktion eingesetzt, die auf der Ahnlichkeit der Protonen-Kernresonanzspektren der potentiellen Komplexstrukturen mit dem experimentellen Spektrum beruht. Mit dieser Methode konnten wir sogar die korrekte Struktur eines Protein-Peptid-Komplexes vorhersagen, an dessen Vorhersage energiebasierte Bewertungsfunktionen scheitern. Der zweite Teil der Arbeit stellt BALL (Biochemical ALgorithms Library) vor, ein Rahmenwerk zur schnellen Anwendungsentwicklung im Bereich MolecularModeling. BALL stellt eine Vielzahl von Datenstrukturen und Algorithmen fur die FelderMolekulmechanik,Vergleich und Analyse von Proteinstrukturen, Datei-Import und -Export, NMR-Shiftvorhersage und Visualisierung zur Verfugung. Beim Entwurf von BALL wurde auf Robustheit, einfache Benutzbarkeit und Erweiterbarkeit Wert gelegt. Von existierenden Software-Paketen hebt es sich vor allem durch seine Erweiterbarkeit ab, die auf der konsequenten Anwendung von objektorientierter und generischer Programmierung beruht.

[1]  C. Aflalo,et al.  Hydrophobic docking: A proposed enhancement to molecular recognition techniques , 1994, Proteins.

[2]  Ruth Nussinov,et al.  A Method for Biomolecular Structural Recognition and Docking Allowing Conformational Flexibility , 1998, J. Comput. Biol..

[3]  Y. Hasija,et al.  Biopython , 2022, Hands-On Data Science for Biologists Using Python.

[4]  L W Jelinski,et al.  Nuclear magnetic resonance spectroscopy. , 1995, Academic radiology.

[5]  B. Honig,et al.  A rapid finite difference algorithm, utilizing successive over‐relaxation to solve the Poisson–Boltzmann equation , 1991 .

[6]  Ernst Althaus,et al.  A branch and cut algorithm for the optimal solution of the side-chain placement problem , 2000 .

[7]  NMR solution structure of a complex of calmodulin with a binding peptide of the Ca2+ pump. , 1999 .

[8]  Ivar Jacobson,et al.  The unified modeling language reference manual , 2010 .

[9]  A. W. Gillies An Elliptic Integral , 1969 .

[10]  Frank J.M. van de Ven,et al.  Multidimensional NMR in Liquids: Basic Principles and Experimental Methods , 1995 .

[11]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[12]  Ruth Nussinov,et al.  3-D Docking of Protein Molecules , 1993, CPM.

[13]  P. Kollman,et al.  Encyclopedia of computational chemistry , 1998 .

[14]  R. Nussinov,et al.  A geometry-based suite of molecular docking processes. , 1995, Journal of molecular biology.

[15]  Calton Pu,et al.  Design and application of PDBlib, a C++ macromolecular class library , 1994, Comput. Appl. Biosci..

[16]  R. Mallion,et al.  Ring current theories in nuclear magnetic resonance , 1979 .

[17]  簡聰富,et al.  物件導向軟體之架構(Object-Oriented Software Construction)探討 , 1989 .

[18]  Warren Harrison,et al.  A note on the Berry-Meekings style metric , 1986, CACM.

[19]  Larry Wall,et al.  Programming Perl , 1991 .

[20]  F. Alan Andersen,et al.  The American National Standards Institute , 1984, IEEE Engineering in Medicine and Biology Magazine.

[21]  R. Abagyan,et al.  Biased probability Monte Carlo conformational searches and electrostatic calculations for peptides and proteins. , 1994, Journal of molecular biology.

[22]  Michael Jünger,et al.  Introduction to ABACUS - a branch-and-cut system , 1998, Oper. Res. Lett..

[23]  S Vajda,et al.  Prediction of protein complexes using empirical free energy functions , 1996, Protein science : a publication of the Protein Society.

[24]  D. Koshland Application of a Theory of Enzyme Specificity to Protein Synthesis. , 1958, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[26]  H. Hancock,et al.  Elliptic Integrals , 1958 .

[27]  R A Sayle,et al.  RASMOL: biomolecular graphics for all. , 1995, Trends in biochemical sciences.

[28]  David J Weber,et al.  The Ca2+-Dependent Interaction of S100B(ββ) with a Peptide Derived from p53† , 1998 .

[29]  Ruben Abagyan,et al.  Detailed ab initio prediction of lysozyme–antibody complex with 1.6 Å accuracy , 1994, Nature Structural Biology.

[30]  Hans-Peter Lenhof,et al.  A NMR-spectra-based scoring function for protein docking , 2001, RECOMB.

[31]  Johan Desmet,et al.  The dead-end elimination theorem and its use in protein side-chain positioning , 1992, Nature.

[32]  K. Sharp,et al.  Accurate Calculation of Hydration Free Energies Using Macroscopic Solvent Models , 1994 .

[33]  R. E. Berry,et al.  A style analysis of C programs , 1985, CACM.

[34]  John K. Ousterhout,et al.  Scripting: Higher-Level Programming for the 21st Century , 1998, Computer.

[35]  M. Sternberg,et al.  An analysis of conformational changes on protein-protein association: implications for predictive docking. , 1999, Protein engineering.

[36]  R Nussinov,et al.  Flexible docking allowing induced fit in proteins: Insights from an open to closed conformational isomers , 1998, Proteins.

[37]  David M. Beazley,et al.  SWIG: An Easy to Use Tool for Integrating Scripting Languages with C and C++ , 1996, Tcl/Tk Workshop.

[38]  J. Feeney,et al.  Validation of a new restraint docking method for solution structure determinations of protein–ligand complexes , 1999, Journal of biomolecular NMR.

[39]  M J Sternberg,et al.  Predictive docking of protein-protein and protein-DNA complexes. , 1998, Current opinion in structural biology.

[40]  Hans-Peter Lenhof,et al.  Rapid software prototyping in molecular modeling using the biochemical algorithms library (BALL) , 2000, JEAL.

[41]  D. Case,et al.  A new analysis of proton chemical shifts in proteins , 1991 .

[42]  H. P. Williams THEORY OF LINEAR AND INTEGER PROGRAMMING (Wiley-Interscience Series in Discrete Mathematics and Optimization) , 1989 .

[43]  Geert-Jan Giezeman,et al.  On the design of CGAL a computational geometry algorithms library , 2000, Softw. Pract. Exp..

[44]  Steve McConnell,et al.  Code complete - a practical handbook of software construction, 2nd Edition , 1993 .

[45]  Hans-Peter Lenhof An algorithm for the protein docking problem , 1995 .

[46]  Charles W. Krueger,et al.  Software reuse , 1992, CSUR.

[47]  Lynn W. Jelinski,et al.  Nuclear magnetic resonance spectroscopy. , 1990, Analytical chemistry.

[48]  P. Schleyer Encyclopedia of computational chemistry , 1998 .

[49]  M. Williamson,et al.  A method for the calculation of protein α-CH chemical shifts , 1992 .

[50]  A. D. Buckingham,et al.  CHEMICAL SHIFTS IN THE NUCLEAR MAGNETIC RESONANCE SPECTRA OF MOLECULES CONTAINING POLAR GROUPS , 1960 .

[51]  Alexander A. Stepanov,et al.  Generic Programming , 1988, ISSAC.

[52]  O. Kohlbacher BALL – A Framework for Rapid Application Development in Molecular Modeling , 2001 .

[53]  Raj Srinivasan,et al.  XDR: External Data Representation Standard , 1995, RFC.

[54]  David R. Musser,et al.  STL tutorial and reference guide - C++ programming with the standard template library , 1996, Addison-Wesley professional computing series.

[55]  Pankaj Jalote,et al.  An Integrated Approach to Software Engineering , 1991, Springer Compass International.

[56]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[57]  Lutz Prechelt,et al.  An empirical comparison of C, C++, Java, Perl, Python, Rexx, and Tcl for a search/string-processing program , 2000 .

[58]  Roland L. Dunbrack,et al.  Bayesian statistical analysis of protein side‐chain rotamer preferences , 1997, Protein science : a publication of the Protein Society.

[59]  A. Leach,et al.  Ligand docking to proteins with discrete side-chain flexibility. , 1994, Journal of molecular biology.

[60]  J. W. Humberston Classical mechanics , 1980, Nature.

[61]  James Coplien,et al.  Advanced C++ Programming Styles and Idioms , 1991, Proceedings. Technology of Object-Oriented Languages and Systems, TOOLS 25 (Cat. No.97TB100239).

[62]  Pankaj Jalote,et al.  Synthesizing implementations of abstract data types from axiomatic specifications , 1987, Softw. Pract. Exp..

[63]  C. Laughton,et al.  Prediction of protein side-chain conformations from local three-dimensional homology relationships. , 1994, Journal of molecular biology.

[64]  M. Sternberg,et al.  Modelling protein docking using shape complementarity, electrostatics and biochemical information. , 1997, Journal of molecular biology.

[65]  Ernst Althaus,et al.  A combinatorial approach to protein docking with flexible side-chains , 2000, RECOMB '00.

[66]  A R Leach,et al.  Exploring the conformational space of protein side chains using dead‐end elimination and the A* algorithm , 1998, Proteins.

[67]  M J Sternberg,et al.  A continuum model for protein-protein interactions: application to the docking problem. , 1995, Journal of molecular biology.

[68]  M. Karplus,et al.  Effective energy function for proteins in solution , 1999, Proteins.

[69]  P. Kollman,et al.  A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules , 1995 .

[70]  Klaus Hermann,et al.  Object-oriented sequence analysis: SCL - a C++ class library , 1996, Comput. Appl. Biosci..

[71]  Berend Smit,et al.  Understanding Molecular Simulation , 2001 .

[72]  P. Koehl,et al.  Application of a self-consistent mean field theory to predict protein side-chains conformation and estimate their conformational entropy. , 1994, Journal of molecular biology.

[73]  M. Sternberg,et al.  Rapid refinement of protein interfaces incorporating solvation: application to the docking problem. , 1998, Journal of molecular biology.

[74]  E. Katchalski‐Katzir,et al.  Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[75]  D. Schomburg,et al.  Hydrogen bonding and molecular surface shape complementarity as a basis for protein docking. , 1996, Journal of molecular biology.

[76]  M. Williamson,et al.  Empirical Comparisons of Models for Chemical-Shift Calculation in Proteins , 1993 .

[77]  G. Drobny,et al.  Quantum Description of High‐Resolution NMR in Liquids , 1990 .

[78]  J. Ponder,et al.  Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. , 1987, Journal of molecular biology.

[79]  Chris F. Kemerer,et al.  Towards a metrics suite for object oriented design , 2017, OOPSLA '91.

[80]  H. Friebolin,et al.  Basic one- and two-dimensional NMR spectroscopy , 1991 .

[81]  P. Ehrlich,et al.  Croonian lecture.—On immunity with special reference to cell life , 1900, Proceedings of the Royal Society of London.

[82]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[83]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[84]  David J Weber,et al.  NMR docking of a substrate into the X-ray structure of the Asp-21-->Glu mutant of staphylococcal nuclease. , 1994, Biochemistry.

[85]  M Czjzek,et al.  Heteronuclear NMR and soft docking: an experimental approach for a structural model of the cytochrome c553-ferredoxin complex. , 2000, Biochemistry.

[86]  J L Cornette,et al.  Consistency in structural energetics of protein folding and peptide recognition , 1997, Protein science : a publication of the Protein Society.

[87]  E. Fischer Einfluss der Configuration auf die Wirkung der Enzyme , 1894 .

[88]  John K. Ousterhout,et al.  Tcl and the Tk Toolkit , 1994 .

[89]  Laurence A. Wolsey,et al.  Integer and Combinatorial Optimization , 1988 .

[90]  David A. Case,et al.  Analysis of proton chemical shifts in regular secondary structure of proteins , 1994, Journal of biomolecular NMR.

[91]  Robert E. Bruccoleri,et al.  Antibody modeling using the conformational search program CONGEN , 1992 .

[92]  R. Mallion,et al.  New tables of ‘ring current’ shielding in proton magnetic resonance , 1972 .

[93]  C. Sander,et al.  Fast and simple monte carlo algorithm for side chain optimization in proteins: Application to model building by homology , 1992, Proteins.

[94]  Curtis R. Cook,et al.  A programming style taxonomy , 1991, J. Syst. Softw..

[95]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[96]  M. J. Gradwell,et al.  Validation of the use of intermolecular NOE constraints for obtaining docked structures of protein-ligand complexes , 1996, Journal of biomolecular NMR.

[97]  Hans-Peter Lenhof,et al.  New contact measures for the protein docking problem , 1997, RECOMB '97.

[98]  E. Purcell,et al.  Resonance Absorption by Nuclear Magnetic Moments in a Solid , 1946 .

[99]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[100]  Silvio Levy,et al.  The CWEB system of structured documentation - version 3.0 , 1994 .

[101]  Frank Stajano Implementing the SMS server, or why I switched from Tcl to Python , 1998 .

[102]  G. Nemhauser,et al.  Integer Programming , 2020 .

[103]  M. L. Connolly Shape complementarity at the hemoglobin α1β1 subunit interface , 1986 .

[104]  W. M. Westler,et al.  A relational database for sequence-specific protein NMR data , 1991, Journal of biomolecular NMR.

[105]  M. Karplus,et al.  CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .