Proteo-chemometrics interaction fingerprints of protein-ligand complexes predict binding affinity

MOTIVATION Reliable predictive models of protein-ligand binding affinity are required in many areas of biomedical research. Accurate prediction based on current descriptors or molecular fingerprints remains a challenge. We develop novel interaction fingerprints (IFPs) to encode protein-ligand interactions and use them to improve the prediction. RESULTS Proteo-chemometrics IFPs (PrtCmm IFPs) formed by combining extended connectivity fingerprints (ECFPs) with the proteo-chemometrics concept, were developed. Combining PrtCmm IFPs with machine-learning models led to efficient scoring models, which were validated on the PDBbind v2019 core set and CSAR-HiQ sets. The PrtCmm IFP Score outperformed several other models in predicting protein-ligand binding affinities. Besides, conventional ECFPs were simplified to generate new IFPs, which provided consistent but faster predictions. The relationship between the base atom properties of ECFPs and the accuracy of predictions was also investigated. AVAILABILITY PrtCmm IFP has been implemented in the IFP Score Toolkit on github https://github.com/debbydanwang/IFPscore. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  T. Lundstedt,et al.  Development of proteo-chemometrics: a novel technology for the analysis of drug-receptor interactions. , 2001, Biochimica et biophysica acta.

[2]  Natalia Artemenko,et al.  Distance Dependent Scoring Function for Describing Protein-Ligand Intermolecular Interactions , 2008, J. Chem. Inf. Model..

[3]  Wei Deng,et al.  Predicting Protein-Ligand Binding Affinities Using Novel Geometrical Descriptors and Machine-Learning Methods , 2004, J. Chem. Inf. Model..

[4]  J. Bajorath,et al.  Docking and scoring in virtual screening for drug discovery: methods and applications , 2004, Nature Reviews Drug Discovery.

[5]  Maciej Wójcikowski,et al.  Development of a protein–ligand extended connectivity (PLEC) fingerprint and its application for binding affinity predictions , 2018, Bioinform..

[6]  Renxiao Wang,et al.  The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. , 2004, Journal of medicinal chemistry.

[7]  Dmitri B. Kireev,et al.  Structural Protein–Ligand Interaction Fingerprints (SPLIF) for Structure-Based Virtual Screening: Method and Benchmark Study , 2014, J. Chem. Inf. Model..

[8]  D. J. Price,et al.  Assessing scoring functions for protein-ligand interactions. , 2004, Journal of medicinal chemistry.

[9]  Marta M. Stepniewska-Dziubinska,et al.  Development and evaluation of a deep learning model for protein–ligand binding affinity prediction , 2017, Bioinform..

[10]  Debby D Wang,et al.  Computationally predicting binding affinity in protein-ligand complexes: free energy-based simulations and machine learning-based scoring functions , 2020, Briefings Bioinform..

[11]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[12]  Lin-Li Li,et al.  ID-Score: A New Empirical Scoring Function Based on a Comprehensive Set of Descriptors Related to Protein-Ligand Interactions , 2013, J. Chem. Inf. Model..

[13]  X Chen,et al.  BindingDB: a web-accessible molecular recognition database. , 2001, Combinatorial chemistry & high throughput screening.

[14]  Jean Gaudart,et al.  Comparison of the performance of multi-layer perceptron and linear regression for epidemiological data , 2004, Comput. Stat. Data Anal..

[15]  Z. Deng,et al.  Structural interaction fingerprint (SIFt): a novel method for analyzing three-dimensional protein-ligand binding interactions. , 2004, Journal of medicinal chemistry.

[16]  Sourav Das,et al.  Binding Affinity Prediction with Property-Encoded Shape Distribution Signatures , 2010, J. Chem. Inf. Model..

[17]  C. E. Peishoff,et al.  A critical assessment of docking programs and scoring functions. , 2006, Journal of medicinal chemistry.

[18]  Richard D. Smith,et al.  CSAR Benchmark Exercise of 2010: Selection of the Protein–Ligand Complexes , 2011, J. Chem. Inf. Model..

[19]  Chee Keong Kwoh,et al.  CScore: a simple yet effective scoring function for protein-ligand binding affinity prediction using modified CMAC learning architecture. , 2011, Journal of bioinformatics and computational biology.

[20]  Enade Perdana Istyastono,et al.  PyPLIF: Python-based Protein-Ligand Interaction Fingerprinting , 2013, Bioinformation.

[21]  Bin Chen,et al.  Comparison of Random Forest and Pipeline Pilot Naïve Bayes in Prospective QSAR Predictions , 2012, J. Chem. Inf. Model..

[22]  Joseph Gomes,et al.  MoleculeNet: a benchmark for molecular machine learning† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02664a , 2017, Chemical science.

[23]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[24]  David S. Wishart,et al.  DrugBank: a knowledgebase for drugs, drug actions and drug targets , 2007, Nucleic Acids Res..

[25]  Jinyan Li,et al.  Binding Affinity Prediction for Protein-Ligand Complexes Based on β Contacts and B Factor , 2013, J. Chem. Inf. Model..

[26]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[27]  Michael G. Lerner,et al.  Binding MOAD (Mother Of All Databases) , 2005, Proteins.

[28]  John B. O. Mitchell,et al.  A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking , 2010, Bioinform..

[29]  Christoph A. Sotriffer,et al.  SFCscoreRF: A Random Forest-Based Scoring Function for Improved Affinity Prediction of Protein-Ligand Complexes , 2013, J. Chem. Inf. Model..