Fragment-Based Ligand-Protein Contact Statistics: Application to Docking Simulations

In this work, the information contained in the contacts between fragments of small-molecule ligands and protein residues has been collected and its exploitability has been verified by using the scoring of docking simulations as a test case for bringing about a proof of concept. Contact statistics between small-molecule fragments and binding site residues were collected and analyzed using a dataset composed of 200,000+ binding sites and associated ligands, derived from the database of the LIBRA ligand binding site recognition software, as a starting point. The fragments were generated by applying the decomposition algorithm implemented in BRICS. A simple “potential” based on the contact frequencies was tested against the CASF-2013 benchmark; its performance was then evaluated through the rescoring of docking poses generated for the DUD-E dataset. The results obtained indicate that this approach, its simplicity notwithstanding, yields promising results that are comparable, and in some cases, superior, to those obtained with other, more complex scoring functions.

[1]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[2]  I. Kuntz,et al.  Automated docking with grid‐based energy evaluation , 1992 .

[3]  Fabio Polticelli,et al.  LIBRA-WA: a web application for ligand binding site detection and protein function recognition , 2018, Bioinform..

[4]  Darko Butina,et al.  Unsupervised Data Base Clustering Based on Daylight's Fingerprint and Tanimoto Similarity: A Fast and Automated Way To Cluster Small and Large Data Sets , 1999, J. Chem. Inf. Comput. Sci..

[5]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[6]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[7]  Matthew P. Repasky,et al.  Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. , 2004, Journal of medicinal chemistry.

[8]  P Willett,et al.  Development and validation of a genetic algorithm for flexible docking. , 1997, Journal of molecular biology.

[9]  Chen Cao,et al.  Improving the performance of the PLB index for ligand-binding site prediction using dihedral angles and the solvent-accessible surface area , 2016, Scientific Reports.

[10]  Yang Zhang,et al.  BioLiP: a semi-manually curated database for biologically relevant ligand–protein interactions , 2012, Nucleic Acids Res..

[11]  G. Morris,et al.  Molecular docking. , 2008, Methods in molecular biology.

[12]  Zhihai Liu,et al.  Comparative Assessment of Scoring Functions on an Updated Benchmark: 2. Evaluation Methods and General Results , 2014, J. Chem. Inf. Model..

[13]  Carmay Lim,et al.  Principles governing Mg, Ca, and Zn binding and selectivity in proteins. , 2003, Chemical reviews.

[14]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[15]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[16]  Ajay N. Jain Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine. , 2003, Journal of medicinal chemistry.

[17]  J. Tuszynski,et al.  Software for molecular docking: a review , 2017, Biophysical Reviews.

[18]  Mihaly Mezei,et al.  A new method for mapping macromolecular topography. , 2003, Journal of molecular graphics & modelling.

[19]  Jie Li,et al.  PDB-wide collection of binding data: current status of the PDBbind database , 2015, Bioinform..

[20]  Michael M. Hann,et al.  RECAP — Retrosynthetic Combinatorial Analysis Procedure: A Powerful New Technique for Identifying Privileged Molecular Fragments with Useful Applications in Combinatorial Chemistry. , 1998 .

[21]  Sereina Riniker,et al.  Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation , 2015, J. Chem. Inf. Model..

[22]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[23]  Fabio Polticelli,et al.  LIBRA: LIgand Binding site Recognition Application , 2015, Bioinform..

[24]  D. Koshland The Key–Lock Theory and the Induced Fit Theory , 1995 .

[25]  Hiroki Shirai,et al.  Use of Amino Acid Composition to Predict Ligand-Binding Sites , 2007, J. Chem. Inf. Model..

[26]  Matthias Rarey,et al.  On the Art of Compiling and Using 'Drug‐Like' Chemical Fragment Spaces , 2008, ChemMedChem.

[27]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[28]  Michael M. Mysinger,et al.  Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking , 2012, Journal of medicinal chemistry.

[29]  J. Thornton,et al.  An overview of the structures of protein-DNA complexes , 2000, Genome Biology.

[30]  S. Jones,et al.  Principles of protein-protein interactions. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[31]  David S. Goodsell,et al.  AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility , 2009, J. Comput. Chem..

[32]  Hege S. Beard,et al.  Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. , 2004, Journal of medicinal chemistry.