Encoding Protein-Ligand Interaction Patterns in Fingerprints and Graphs

We herewith present a novel and universal method to convert protein-ligand coordinates into a simple fingerprint of 210 integers registering the corresponding molecular interaction pattern. Each interaction (hydrophobic, aromatic, hydrogen bond, ionic bond, metal complexation) is detected on the fly and physically described by a pseudoatom centered either on the interacting ligand atom, the interacting protein atom, or the geometric center of both interacting atoms. Counting all possible triplets of interaction pseudoatoms within six distance ranges, and pruning the full integer vector to keep the most frequent triplets enables the definition of a simple (210 integers) and coordinate frame-invariant interaction pattern descriptor (TIFP) that can be applied to compare any pair of protein-ligand complexes. TIFP fingerprints have been calculated for ca. 10,000 druggable protein-ligand complexes therefore enabling a wide comparison of relationships between interaction pattern similarity and ligand or binding site pairwise similarity. We notably show that interaction pattern similarity strongly depends on binding site similarity. In addition to the TIFP fingerprint which registers intermolecular interactions between a ligand and its target protein, we developed two tools (Ishape, Grim) to align protein-ligand complexes from their interaction patterns. Ishape is based on the overlap of interaction pseudoatoms using a smooth Gaussian function, whereas Grim utilizes a standard clique detection algorithm to match interaction pattern graphs. Both tools are complementary and enable protein-ligand complex alignments capitalizing on both global and local pattern similarities. The new fingerprint and companion alignment tools have been successfully used in three scenarios: (i) interaction-biased alignment of protein-ligand complexes, (ii) postprocessing docking poses according to known interaction patterns for a particular target, and (iii) virtual screening for bioisosteric scaffolds sharing similar interaction patterns.

[1]  Tom Blundell,et al.  CREDO: A Protein–Ligand Interaction Database for Drug Discovery , 2009, Chemical biology & drug design.

[2]  Gisbert Schneider,et al.  Virtual screening: an endless staircase? , 2010, Nature Reviews Drug Discovery.

[3]  David A. Gough,et al.  Virtual Screen for Ligands of Orphan G Protein-Coupled Receptors , 2005, J. Chem. Inf. Model..

[4]  Olivier Michielin,et al.  SwissBioisostere: a database of molecular replacements for ligand design , 2012, Nucleic Acids Res..

[5]  Hong Liu,et al.  Computational Screening for Active Compounds Targeting Protein Sequences: Methodology and Experimental Validation , 2011, J. Chem. Inf. Model..

[6]  M. Kanehisa,et al.  Using the KEGG Database Resource , 2005, Current protocols in bioinformatics.

[7]  D. Rognan,et al.  Identification of Nonpeptide Oxytocin Receptor Ligands by Receptor‐Ligand Fingerprint Similarity Search , 2011, Molecular informatics.

[8]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[9]  J. Andrew Grant,et al.  Molecular shape and electrostatics in the encoding of relevant chemical information , 2005, J. Comput. Aided Mol. Des..

[10]  Jürgen Bajorath,et al.  Ligand-Target Interaction-Based Weighting of Substructures for Virtual Screening , 2008, J. Chem. Inf. Model..

[11]  J. Mason,et al.  New 4-point pharmacophore method for molecular similarity and diversity applications: overview of the method and applications, including a novel approach to the design of combinatorial libraries containing privileged substructures. , 1999, Journal of medicinal chemistry.

[12]  Ricardo L. Mancera,et al.  Expanded Interaction Fingerprint Method for Analyzing Ligand Binding Modes in Docking and Structure-Based Drug Design , 2004, J. Chem. Inf. Model..

[13]  Peter Willett,et al.  Identification of target-specific bioisosteric fragments from ligand–protein crystallographic data , 2006, J. Comput. Aided Mol. Des..

[14]  Ksenia Oguievetskaia,et al.  Computational Fragment-Based Approach at PDB Scale by Protein Local Similarity , 2009, J. Chem. Inf. Model..

[15]  István Ujváry,et al.  Extended Summary: BIOSTER—a database of structurally analogous compounds , 1997 .

[16]  Didier Rognan,et al.  Structure‐Based Approaches to Target Fishing and Ligand Profiling , 2010, Molecular informatics.

[17]  Didier Rognan,et al.  sc-PDB: a database for identifying variations and multiplicity of 'druggable' binding sites in proteins , 2011, Bioinform..

[18]  Michael M. Hann,et al.  RECAP — Retrosynthetic Combinatorial Analysis Procedure: A Powerful New Technique for Identifying Privileged Molecular Fragments with Useful Applications in Combinatorial Chemistry. , 1998 .

[19]  H. C. Johnston Cliques of a graph-variations on the Bron-Kerbosch algorithm , 2004, International Journal of Computer & Information Sciences.

[20]  Rama Kondru,et al.  PROLIX: Rapid Mining of Protein-Ligand Interactions in Large Crystal Structure Databases , 2012, J. Chem. Inf. Model..

[21]  Christopher I. Bayly,et al.  Evaluating Virtual Screening Methods: Good and Bad Metrics for the "Early Recognition" Problem , 2007, J. Chem. Inf. Model..

[22]  D. Theobald short communications Acta Crystallographica Section A Foundations of , 2005 .

[23]  Didier Rognan,et al.  Comparison and Druggability Prediction of Protein-Ligand Binding Sites from Pharmacophore-Annotated Cavity Shapes , 2012, J. Chem. Inf. Model..

[24]  G. V. Paolini,et al.  Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes , 1997, J. Comput. Aided Mol. Des..

[25]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[26]  Tina Ritschel,et al.  Pharmacophore Fingerprint-Based Approach to Binding Site Subpocket Similarity and Its Application to Bioisostere Replacement , 2012, J. Chem. Inf. Model..

[27]  Peter Willett,et al.  Knowledge-Based Interaction Fingerprint Scoring: A Simple Method for Improving the Effectiveness of Fast Scoring Functions , 2006, J. Chem. Inf. Model..

[28]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[29]  Zhan Deng,et al.  Interaction profiles of protein kinase-inhibitor complexes and their application to virtual screening. , 2005, Journal of medicinal chemistry.

[30]  Ajay N. Jain Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine. , 2003, Journal of medicinal chemistry.

[31]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[32]  Ajay N. Jain,et al.  Recommendations for evaluation of computational methods , 2008, J. Comput. Aided Mol. Des..

[33]  Gilles Marcou,et al.  Hot-Spots-Guided Receptor-Based Pharmacophores (HS-Pharm): A Knowledge-Based Approach to Identify Ligand-Anchoring Atoms in Protein Cavities and Prioritize Structure-Based Pharmacophores , 2008, J. Chem. Inf. Model..

[34]  Markus Wagener,et al.  The Quest for Bioisosteric Replacements , 2006, J. Chem. Inf. Model..

[35]  Z. Deng,et al.  Structural interaction fingerprint (SIFt): a novel method for analyzing three-dimensional protein-ligand binding interactions. , 2004, Journal of medicinal chemistry.

[36]  Eugen Lounkine,et al.  Similarity Searching Using Fingerprints of Molecular Fragments Involved in Protein-Ligand Interactions , 2008, J. Chem. Inf. Model..

[37]  Michael M. Mysinger,et al.  Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking , 2012, Journal of medicinal chemistry.

[38]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[39]  Peter Willett,et al.  Similarity Searching in Files of Three-Dimensional Chemical Structures: Analysis of the BIOSTER Database Using Two-Dimensional Fingerprints and Molecular Field Descriptors , 2000, J. Chem. Inf. Comput. Sci..

[40]  J. Pin,et al.  Virtual screening workflow development guided by the "receiver operating characteristic" curve approach. Application to high-throughput docking on metabotropic glutamate receptor subtype 4. , 2005, Journal of medicinal chemistry.

[41]  Pierre Baldi,et al.  A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval , 2010, Bioinform..

[42]  J. A. Grant,et al.  A Gaussian Description of Molecular Shape , 1995 .

[43]  Zhan Deng,et al.  Knowledge-based design of target-focused libraries using protein-ligand interaction constraints. , 2006, Journal of medicinal chemistry.

[44]  J. A. Grant,et al.  A fast method of molecular shape comparison: A simple application of a Gaussian description of molecular shape , 1996, J. Comput. Chem..

[45]  Obdulia Rabal,et al.  APIF: A New Interaction Fingerprint Based on Atom Pairs and Its Application to Virtual Screening , 2009, J. Chem. Inf. Model..

[46]  Paul N. Mortenson,et al.  Diverse, high-quality test set for the validation of protein-ligand docking performance. , 2007, Journal of medicinal chemistry.

[47]  Peter Willett,et al.  Similarity-based virtual screening using 2D fingerprints. , 2006, Drug discovery today.

[48]  Chris G. Kruse,et al.  Assessment of scaffold hopping efficiency by use of molecular interaction fingerprints. , 2008, Journal of medicinal chemistry.

[49]  Gilles Marcou,et al.  Optimizing Fragment and Scaffold Docking by Use of Molecular Interaction Fingerprints , 2007, J. Chem. Inf. Model..

[50]  Nathanael Weill,et al.  Development and Validation of a Novel Protein-Ligand Fingerprint To Mine Chemogenomic Space: Application to G Protein-Coupled Receptors and Their Ligands , 2009, J. Chem. Inf. Model..

[51]  Jürgen Bajorath,et al.  Computational Methodologies for Compound Database Searching that Utilize Experimental Protein–Ligand Interaction Information , 2010, Chemical biology & drug design.