The recent progress in proteochemometric modelling: focusing on target descriptors, cross‐term descriptors and application scope

Abstract As an extension of the conventional quantitative structure activity relationship models, proteochemometric (PCM) modelling is a computational method that can predict the bioactivity relations between multiple ligands and multiple targets. Traditional PCM modelling includes three essential elements: descriptors (including target descriptors, ligand descriptors and cross‐term descriptors), bioactivity data and appropriate learning functions that link the descriptors to the bioactivity data. Since its appearance, PCM modelling has developed rapidly over the past decade by taking advantage of the progress of different descriptors and machine learning techniques, along with the increasing amounts of available bioactivity data. Specifically, the new emerging target descriptors and cross‐term descriptors not only significantly increased the performance of PCM modelling but also expanded its application scope from traditional protein‐ligand interaction to more abundant interactions, including protein‐peptide, protein‐DNA and even protein‐protein interactions. In this review, target descriptors and cross‐term descriptors, as well as the corresponding application scope, are intensively summarized. Additionally, we look forward to seeing PCM modelling extend into new application scopes, such as Target‐Catalyst‐Ligand systems, with the further development of descriptors, machine learning techniques and increasing amounts of available bioactivity data.

[1]  Peteris Prusis,et al.  Proteochemometric analysis of small cyclic peptides' interaction with wild‐type and chimeric melanocortin receptors , 2007, Proteins.

[2]  Peteris Prusis,et al.  Prediction of indirect interactions in proteins , 2006, BMC Bioinformatics.

[3]  Zhiwei Cao,et al.  Study on human GPCR-inhibitor interactions by proteochemometric modeling. , 2013, Gene.

[4]  P. Prusis,et al.  Proteochemometrics analysis of substrate interactions with dengue virus NS3 proteases. , 2008, Bioorganic & medicinal chemistry.

[5]  Rafael Garcia,et al.  Bio-AIMS Collection of Chemoinformatics Web Tools based on Molecular Graph Information and Artificial Intelligence Models. , 2015, Combinatorial chemistry & high throughput screening.

[6]  F. Tian,et al.  T-scale as a novel vector of topological descriptors for amino acids and its application in QSARs of peptides , 2007 .

[7]  Shengshi Z. Li,et al.  A new set of amino acid descriptors and its application in peptide QSARs. , 2005, Biopolymers.

[8]  Chris Morley,et al.  Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit , 2008, Chemistry Central journal.

[9]  R G Smith,et al.  Rapid identification of subtype-selective agonists of the somatostatin receptor through combinatorial chemistry. , 1998, Science.

[10]  P. Prusis,et al.  Proteochemometric modelling of antibody-antigen interactions using SPOT synthesised peptide arrays. , 2007, Protein engineering, design & selection : PEDS.

[11]  Cristian R. Munteanu,et al.  Modeling Complex Metabolic Reactions, Ecological Systems, and Financial and Legal Networks with MIANN Models Based on Markov-Wiener Node Descriptors , 2014, J. Chem. Inf. Model..

[12]  Ian H. Witten,et al.  WEKA - Experiences with a Java Open-Source Project , 2010, J. Mach. Learn. Res..

[13]  Peteris Prusis,et al.  Improved approach for proteochemometrics modeling: application to organic compound - amine G protein-coupled receptor interactions , 2005, Bioinform..

[14]  Peter Gedeck,et al.  QSAR - How Good Is It in Practice? Comparison of Descriptor Sets on an Unbiased Cross Section of Corporate Data Sets , 2006, J. Chem. Inf. Model..

[15]  Didier Rognan,et al.  Enhancing the Accuracy of Chemogenomic Models with a Three-Dimensional Binding Site Kernel , 2011, J. Chem. Inf. Model..

[16]  P. Prusis,et al.  Visually Interpretable Models of Kinase Selectivity Related Features Derived from Field-Based Proteochemometrics , 2013, J. Chem. Inf. Model..

[17]  M. Shu,et al.  ST-scale as a novel amino acid descriptor and its application in QSAM of peptides and analogues , 2010, Amino Acids.

[18]  Ola Spjuth,et al.  Proteochemometric Modeling of the Susceptibility of Mutated Variants of the HIV-1 Virus to Reverse Transcriptase Inhibitors , 2010, PloS one.

[19]  D. Rognan Chemogenomic approaches to rational drug design , 2007, British journal of pharmacology.

[20]  L. Jiang,et al.  PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence , 2006, Nucleic Acids Res..

[21]  T. Lundstedt,et al.  PLS modeling of chimeric MS04/MSH-peptide and MC1/MC3-receptor interactions reveals a novel method for the analysis of ligand-receptor interactions. , 2001, Biochimica et biophysica acta.

[22]  Cristian R. Munteanu,et al.  Definition of Markov-Harary Invariants and Review of Classic Topological Indices and Databases in Biology, Parasitology, Technology,and Social-Legal Networks , 2011 .

[23]  Peteris Prusis,et al.  Proteochemometric modeling of HIV protease susceptibility , 2008, BMC Bioinformatics.

[24]  Douglas M. Hawkins,et al.  Assessing Model Fit by Cross-Validation , 2003, J. Chem. Inf. Comput. Sci..

[25]  R. Glen,et al.  Similarity searching of chemical databases using atom environment descriptors : evaluation of performance , 2004 .

[26]  Gerard J. P. van Westen,et al.  Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets , 2013, Journal of Cheminformatics.

[27]  Andreas Bender,et al.  Modelling ligand selectivity of serine proteases using integrative proteochemometric approaches improves model performance and allows the multi-target dependent interpretation of features. , 2014, Integrative biology : quantitative biosciences from nano to macro.

[28]  D. Flower,et al.  Peptide binding to the HLA-DRB1 supertype: a proteochemometrics analysis. , 2010, European journal of medicinal chemistry.

[29]  P. Prusis,et al.  Design and evaluation of substrate-based octapeptide and non substrate-based tetrapeptide inhibitors of dengue virus NS2B-NS3 proteases. , 2013, Biochemical and biophysical research communications.

[30]  Peter Gedeck,et al.  Global Free Energy Scoring Functions Based on Distance-Dependent Atom-Type Pair Descriptors , 2011, J. Chem. Inf. Model..

[31]  R. Glen,et al.  Molecular similarity: a key technique in molecular informatics. , 2004, Organic & biomolecular chemistry.

[32]  Isidro Cortes-Ciriano,et al.  Proteochemometric modeling in a Bayesian framework , 2014, Journal of Cheminformatics.

[33]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[34]  A. Bender,et al.  Circular fingerprints: flexible molecular descriptors with applications from physical chemistry to ADME. , 2006, IDrugs : the investigational drugs journal.

[35]  C. Hansch,et al.  THE USE OF SUBSTITUENT CONSTANTS IN THE ANALYSIS OF THE STRUCTURE--ACTIVITY RELATIONSHIP IN PENICILLIN DERIVATIVES. , 1964, Journal of medicinal chemistry.

[36]  Cristian R. Munteanu,et al.  The Rücker-Markov invariants of complex Bio-Systems: Applications in Parasitology and Neuroinformatics , 2013, Biosyst..

[37]  Ingebrigt Sylte,et al.  Protein binding site analysis by means of structural interaction fingerprint patterns. , 2011, Bioorganic & medicinal chemistry letters.

[38]  J. Komorowski,et al.  Generalized Proteochemometric Model of Multiple Cytochrome P450 Enzymes and Their Inhibitors. , 2008 .

[39]  Laura Palagi,et al.  On the convergence of a modified version of SVM light algorithm , 2005, Optim. Methods Softw..

[40]  Irini A. Doytchinova,et al.  EpiTOP - a proteochemometric tool for MHC class II binding prediction , 2010, Bioinform..

[41]  Isidro Cortes-Ciriano,et al.  Polypharmacology modelling using proteochemometrics (PCM): recent methodological developments, applications to target families, and future prospects , 2015 .

[42]  Peteris Prusis,et al.  Rough set‐based proteochemometrics modeling of G‐protein‐coupled receptor‐ligand interactions , 2006, Proteins.

[43]  George Papadatos,et al.  ChEMBL web services: streamlining access to drug discovery data and utilities , 2015, Nucleic Acids Res..

[44]  Evan Bolton,et al.  PubChem's BioAssay Database , 2011, Nucleic Acids Res..

[45]  Isidro Cortes-Ciriano,et al.  Proteochemometric modelling coupled to in silico target prediction: an integrated approach for the simultaneous prediction of polypharmacology and binding affinity/potency of small molecules , 2015, Journal of Cheminformatics.

[46]  Isidro Cortes-Ciriano,et al.  Chemically Aware Model Builder (camb): an R package for property and bioactivity modelling of small molecules , 2015, Journal of Cheminformatics.

[47]  Chartchalerm Isarankura-Na-Ayudhya,et al.  Proteochemometric model for predicting the inhibition of penicillin-binding proteins , 2015, Journal of Computer-Aided Molecular Design.

[48]  H. V. van Vlijmen,et al.  Identifying novel adenosine receptor ligands by simultaneous proteochemometric modeling of rat and human bioactivity data. , 2012, Journal of medicinal chemistry.

[49]  Alexander G. Georgiev,et al.  Interpretable Numerical Descriptors of Amino Acid Space , 2009, J. Comput. Biol..

[50]  Dingfeng Wu,et al.  Proteochemometric Modeling of the Antigen-Antibody Interaction: New Fingerprints for Antigen, Antibody and Epitope-Paratope Interaction , 2015, PloS one.

[51]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[52]  Oakland J. Peters,et al.  Predicting new indications for approved drugs using a proteochemometric method. , 2012, Journal of medicinal chemistry.

[53]  Gerard J. P. van Westen,et al.  Significantly Improved HIV Inhibitor Efficacy Prediction Employing Proteochemometric Models Generated From Antivirogram Data , 2013, PLoS Comput. Biol..

[54]  Z. R. Li,et al.  Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence , 2006, Nucleic Acids Res..

[55]  Isidro Cortes-Ciriano,et al.  Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets , 2013, Journal of Cheminformatics.

[56]  Teruki Honma,et al.  Combining Machine Learning and Pharmacophore-Based Interaction Fingerprint for in Silico Screening , 2010, J. Chem. Inf. Model..

[57]  T. Lundstedt,et al.  Proteochemometrics modeling of the interaction of amine G-protein coupled receptors with a diverse set of ligands. , 2002, Molecular pharmacology.

[58]  Virapong Prachayasittikul,et al.  Illuminating the origins of spectral properties of green fluorescent proteins via proteochemometric and molecular modeling , 2014, J. Comput. Chem..

[59]  T. Lundstedt,et al.  Development of proteo-chemometrics: a novel technology for the analysis of drug-receptor interactions. , 2001, Biochimica et biophysica acta.

[60]  Maris Lapins,et al.  Towards Proteome–Wide Interaction Models Using the Proteochemometrics Approach , 2010, Molecular informatics.

[61]  T. Blundell,et al.  Structural biology and drug discovery of difficult targets: the limits of ligandability. , 2012, Chemistry & biology.

[62]  Aliuska Duardo-Sanchez,et al.  From chemical graphs in computer-aided drug design to general Markov-Galvez indices of drug-target, proteome, drug-parasitic disease, technological, and social-legal networks. , 2011, Current computer-aided drug design.

[63]  Humberto González-Díaz,et al.  Using entropy of drug and protein graphs to predict FDA drug-target network: theoretic-experimental study of MAO inhibitors and hemoglobin peptides from Fasciola hepatica. , 2011, European journal of medicinal chemistry.

[64]  K. Fidelis,et al.  Generalized modeling of enzyme–ligand interactions using proteochemometrics and local protein substructures , 2006, Proteins.

[65]  Julio Caballero,et al.  Proteochemometric Modeling of the Inhibition Complexes of Matrix Metalloproteinases with N‐Hydroxy‐2‐[(Phenylsulfonyl)Amino]Acetamide Derivatives Using Topological Autocorrelation Interaction Matrix and Model Ensemble Averaging , 2008, Chemical biology & drug design.

[66]  Egon L. Willighagen,et al.  RRegrs: an R package for computer-aided model selection with multiple regression models , 2015, Journal of Cheminformatics.

[67]  Jarl E. S. Wikberg,et al.  Kinome-wide interaction modelling using alignment-based and alignment-independent approaches for kinase description and linear and non-linear data analysis techniques , 2010, BMC Bioinformatics.

[68]  A. Bender,et al.  Prediction of PARP Inhibition with Proteochemometric Modelling and Conformal Prediction , 2015, Molecular informatics.

[69]  G. V. van Westen,et al.  Structure-Based Identification of OATP1B1/3 Inhibitors , 2013, Molecular Pharmacology.

[70]  S. Wold,et al.  New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids. , 1998, Journal of medicinal chemistry.

[71]  R. Zauhar,et al.  Computational studies on HIV-1 protease inhibitors: influence of calculated inhibitor-enzyme binding affinities on the statistical quality of 3D-QSAR CoMFA models. , 2000, Journal of medicinal chemistry.

[72]  J. Doucet,et al.  QSAR models for 2-amino-6-arylsulfonylbenzonitriles and congeners HIV-1 reverse transcriptase inhibitors based on linear and nonlinear regression methods. , 2009, European journal of medicinal chemistry.

[73]  Peter Gedeck,et al.  Three Descriptor Model Sets a High Standard for the CSAR-NRC HiQ Benchmark , 2011, J. Chem. Inf. Model..

[74]  Peteris Prusis,et al.  QSAR and proteo-chemometric analysis of the interaction of a series of organic compounds with melanocortin receptor subtypes. , 2003, Journal of medicinal chemistry.

[75]  Zhiliang Li,et al.  Factor Analysis Scale of Generalized Amino Acid Information as the Source of a New Set of Descriptors for Elucidating the Structure and Activity Relationships of Cationic Antimicrobial Peptides , 2007 .

[76]  J. Komorowski,et al.  Proteochemometrics mapping of the interaction space for retroviral proteases and their substrates. , 2009, Bioorganic & medicinal chemistry.

[77]  Jarl E. S. Wikberg,et al.  Proteochemometric Modeling of Drug Resistance over the Mutational Space for Multiple HIV Protease Variants and Multiple Protease Inhibitors , 2009, J. Chem. Inf. Model..

[78]  Isidro Cortes-Ciriano,et al.  Prediction of the potency of mammalian cyclooxygenase inhibitors with ensemble proteochemometric modeling , 2015, Journal of Cheminformatics.

[79]  Nanda Ghoshal,et al.  Target specific proteochemometric model development for BACE1 - protein flexibility and structural water are critical in virtual screening. , 2015, Molecular bioSystems.

[80]  Qi Kang,et al.  Comparison of Ligand-, Target Structure-, and Protein-Ligand Interaction Fingerprint-based Virtual Screening Methods , 2011 .

[81]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[82]  Peteris Prusis,et al.  Unbiased descriptor and parameter selection confirms the potential of proteochemometric modelling , 2005, BMC Bioinformatics.

[83]  Peteris Prusis,et al.  Proteochemometric modeling reveals the interaction site for Trp9 modified α‐MSH peptides in melanocortin receptors , 2007, Proteins.

[84]  Ola Spjuth,et al.  A Unified Proteochemometric Model for Prediction of Inhibition of Cytochrome P450 Isoforms , 2013, PloS one.

[85]  J. Thornton,et al.  Shape variation in protein binding pockets and their ligands. , 2007, Journal of molecular biology.

[86]  Shandar Ahmad,et al.  Proteochemometric Recognition of Stable Kinase Inhibition Complexes Using Topological Autocorrelation and Support Vector Machines , 2010, J. Chem. Inf. Model..

[87]  George Karypis,et al.  Comparison of descriptor spaces for chemical compound retrieval and classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[88]  Peteris Prusis,et al.  Proteochemometric Mapping of the Interaction of Organic Compounds with Melanocortin Receptor Subtypes , 2005, Molecular Pharmacology.

[89]  Jarl E. S. Wikberg,et al.  Interaction Model Based on Local Protein Substructures Generalizes to the Entire Structural Enzyme-Ligand Space , 2008, J. Chem. Inf. Model..

[90]  H. V. van Vlijmen,et al.  Which Compound to Select in Lead Optimization? Prospectively Validated Proteochemometric Models Guide Preclinical Development , 2011, PloS one.

[91]  Jun Gao,et al.  Screening of selective histone deacetylase inhibitors by proteochemometric modeling , 2012, BMC Bioinformatics.

[92]  J L Sussman,et al.  Protein Data Bank (PDB): database of three-dimensional structural information of biological macromolecules. , 1998, Acta crystallographica. Section D, Biological crystallography.

[93]  Humberto González Díaz,et al.  New Markov-Autocorrelation Indices for Re-evaluation of Links in Chemical and Biological Complex Networks used in Metabolomics, Parasitology, Neurosciences, and Epidemiology , 2012, J. Chem. Inf. Model..

[94]  L. G. Pérez-Montoto,et al.  Review of MARCH-INSIDE & complex networks prediction of drugs: ADMET, anti-parasite activity, metabolizing enzymes and cardiotoxicity proteome biomarkers. , 2010, Current drug metabolism.

[95]  Qi Liu,et al.  Virtual Drug Screen Schema Based on Multiview Similarity Integration and Ranking Aggregation , 2012, J. Chem. Inf. Model..

[96]  Zhiwei Cao,et al.  Proteochemometric Modeling of the Bioactivity Spectra of HIV-1 Protease Inhibitors by Introducing Protein-Ligand Interaction Fingerprint , 2012, PloS one.

[97]  Andrea Zaliani,et al.  MS-WHIM Scores for Amino Acids: A New 3D-Description for Peptide QSAR and QSPR Studies , 1999, J. Chem. Inf. Comput. Sci..

[98]  Dong-Sheng Cao,et al.  protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences , 2015, Bioinform..

[99]  Gerard J. P. van Westen,et al.  Proteochemometric modeling as a tool to design selective compounds and for extrapolating to novel targets , 2011 .