The Art of Compiling Protein Binding Site Ensembles

Structure‐based drug design starts with the collection, preparation, and initial analysis of protein structures. With more than 115,000 structures publically available in the Protein Data Bank (PDB), fully automated processes reliably performing these important preprocessing steps are needed. Several tools are available for these tasks, however, most of them do not address the special needs of scientists interested in protein‐ligand interactions. In this paper, we summarize our research activities towards an automated processing pipeline from raw PDB data towards ready‐to‐use protein binding site ensembles. Starting from a single protein structure, the pipeline covers the following phases: Extracting structurally related binding sites from the PDB, aligning disconnected binding site sequences, resolving tautomeric forms and protonation, orienting hydrogens and flippable side‐chains, structurally aligning the multitude of binding sites, and performing a reasonable reduction of ensemble structures. The pipeline, named SIENA, creates protein‐structural ensembles for the analysis of protein flexibility, molecular design efforts like docking or de novo design within seconds. For the first time, we are able to process the whole PDB in order to create a large collection of protein binding site ensembles. SIENA is available as part of the ZBH ProteinsPlus webserver under http://proteinsplus.zbh.uni‐hamburg.de.

[1]  Erin S. Bolstad,et al.  In pursuit of virtual lead optimization: Pruning ensembles of receptor structures for increased efficiency and accuracy during docking , 2009, Proteins.

[2]  J. Richardson,et al.  Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation. , 1999, Journal of molecular biology.

[3]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[4]  M Karplus,et al.  Polar hydrogen positions in proteins: Empirical energy placement and neutron diffraction comparison , 1988, Proteins.

[5]  G. Milne,et al.  Structure-based identification of a ricin inhibitor. , 1997, Journal of molecular biology.

[6]  Matthieu Montes,et al.  Multiple Structures for Virtual Ligand Screening: Defining Binding Site Properties-Based Criteria to Optimize the Selection of the Query , 2013, J. Chem. Inf. Model..

[7]  Oliver Korb,et al.  Potential and Limitations of Ensemble Docking , 2012, J. Chem. Inf. Model..

[8]  Ruben Abagyan,et al.  Recipes for the Selection of Experimental Protein Conformations for Virtual Screening , 2010, J. Chem. Inf. Model..

[9]  L. Kelley,et al.  An automated approach for defining core atoms and domains in an ensemble of NMR-derived protein structures. , 1997, Protein engineering.

[10]  Matthias Rarey,et al.  ASCONA: Rapid Detection and Alignment of Protein Binding Site Conformations , 2015, J. Chem. Inf. Model..

[11]  C. Sander,et al.  Positioning hydrogen atoms by optimizing hydrogen‐bond networks in protein structures , 1996, Proteins.

[12]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Matthias Rarey,et al.  Fast automated placement of polar hydrogen atoms in protein-ligand complexes , 2009, J. Cheminformatics.

[14]  Vivian Cody,et al.  Structure determination of tetrahydroquinazoline antifolates in complex with human and Pneumocystis carinii dihydrofolate reductase: correlations between enzyme selectivity and stereochemistry. , 2004, Acta crystallographica. Section D, Biological crystallography.

[15]  J. Thornton,et al.  Satisfying hydrogen bonding potential in proteins. , 1994, Journal of molecular biology.

[16]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[17]  X. Barril,et al.  Unveiling the full potential of flexible receptor docking using multiple crystallographic structures. , 2005, Journal of medicinal chemistry.

[18]  M. Rarey,et al.  SIENA: Efficient Compilation of Selective Protein Binding Site Ensembles , 2016, J. Chem. Inf. Model..

[19]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[20]  Jianzhu Ma,et al.  Algorithms, applications, and challenges of protein structure alignment. , 2014, Advances in protein chemistry and structural biology.

[21]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[22]  R. Abagyan,et al.  Systematic Exploitation of Multiple Receptor Conformations for Virtual Ligand Screening , 2011, PloS one.

[23]  Philip E Bourne,et al.  Structure comparison and alignment. , 2003, Methods of biochemical analysis.

[24]  Mengang Xu,et al.  Utilizing Experimental Data for Reducing Ensemble Size in Flexible-Protein Docking , 2012, J. Chem. Inf. Model..

[25]  Rommie E. Amaro,et al.  An improved relaxed complex scheme for receptor flexibility in computer-aided drug design , 2008, J. Comput. Aided Mol. Des..

[26]  Paul N. Mortenson,et al.  Diverse, high-quality test set for the validation of protein-ligand docking performance. , 2007, Journal of medicinal chemistry.

[27]  X. Daura,et al.  Folding–unfolding thermodynamics of a β‐heptapeptide from equilibrium simulations , 1999, Proteins.

[28]  Xin Li,et al.  Assignment of polar states for protein amino acid residues using an interaction cluster decomposition algorithm and its application to high resolution protein structure modeling , 2006, Proteins.

[29]  Jonathan W. Essex,et al.  Pocket-Space Maps To Identify Novel Binding-Site Conformations in Proteins , 2011, J. Chem. Inf. Model..

[30]  P. Labute proteins STRUCTURE O FUNCTION O BIOINFORMATICS Protonate3D: Assignment of ionization , 2013 .

[31]  Matthias Rarey,et al.  Protoss: a holistic approach to predict tautomers and protonation states in protein-ligand complexes , 2014, Journal of Cheminformatics.

[32]  Diane Joseph-McCarthy,et al.  Ensemble-Based Docking Using Biased Molecular Dynamics , 2014, J. Chem. Inf. Model..

[33]  R. Ornstein,et al.  A method for determining the positions of polar hydrogens added to a protein structure that maximizes protein hydrogen bonding , 1992, Proteins.

[34]  Glen Eugene Kellogg,et al.  Web application for studying the free energy of binding and protonation states of protein–ligand complexes based on HINT , 2009, J. Comput. Aided Mol. Des..

[35]  Srinivas Aluru,et al.  Handbook Of Computational Molecular Biology , 2010 .

[36]  Roland L. Dunbrack,et al.  Assignment of protonation states in proteins and ligands: combining pKa prediction with hydrogen bonding network optimization. , 2012, Methods in molecular biology.

[37]  Alexander S. Rose,et al.  NGL Viewer: a web application for molecular visualization , 2015, Nucleic Acids Res..

[38]  Richard J. Hall,et al.  Protein-Ligand Docking against Non-Native Protein Conformers , 2008, J. Chem. Inf. Model..

[39]  J. Andrew McCammon,et al.  Discovery of drug-like inhibitors of an essential RNA-editing ligase in Trypanosoma brucei , 2008, Proceedings of the National Academy of Sciences.

[40]  J. Thornton,et al.  The application of hydrogen bonding analysis in X-ray crystallography to help orientate asparagine, glutamine and histidine side chains. , 1995, Protein engineering.

[41]  Mengang Xu,et al.  Significant Enhancement of Docking Sensitivity Using Implicit Ligand Sampling , 2011, J. Chem. Inf. Model..