Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets

BackgroundWhile a large body of work exists on comparing and benchmarking of descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 different protein descriptor sets have been compared with respect to their behavior in perceiving similarities between amino acids. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI and BLOSUM, and a novel protein descriptor set termed ProtFP (4 variants). We investigate to which extent descriptor sets show collinear as well as orthogonal behavior via principal component analysis (PCA).ResultsIn describing amino acid similarities, MSWHIM, T-scales and ST-scales show related behavior, as do the VHSE, FASGAI, and ProtFP (PCA3) descriptor sets. Conversely, the ProtFP (PCA5), ProtFP (PCA8), Z-Scales (Binned), and BLOSUM descriptor sets show behavior that is distinct from one another as well as both of the clusters above. Generally, the use of more principal components (>3 per amino acid, per descriptor) leads to a significant differences in the way amino acids are described, despite that the later principal components capture less variation per component of the original input data.ConclusionIn this work a comparison is provided of how similar (and differently) currently available amino acids descriptor sets behave when converting structure to property space. The results obtained enable molecular modelers to select suitable amino acid descriptor sets for structure-activity analyses, e.g. those showing complementary behavior.

[1]  Jarl E. S. Wikberg,et al.  Kinome-wide interaction modelling using alignment-based and alignment-independent approaches for kinase description and linear and non-linear data analysis techniques , 2010, BMC Bioinformatics.

[2]  H. V. van Vlijmen,et al.  Identifying novel adenosine receptor ligands by simultaneous proteochemometric modeling of rat and human bioactivity data. , 2012, Journal of medicinal chemistry.

[3]  Isidro Cortes-Ciriano,et al.  Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets , 2013, Journal of Cheminformatics.

[4]  Peteris Prusis,et al.  Prediction of indirect interactions in proteins , 2006, BMC Bioinformatics.

[5]  S. Wold,et al.  Peptide quantitative structure-activity relationships, a multivariate approach. , 1987, Journal of medicinal chemistry.

[6]  T. Lundstedt,et al.  Development of proteo-chemometrics: a novel technology for the analysis of drug-receptor interactions. , 2001, Biochimica et biophysica acta.

[7]  F. Tian,et al.  T-scale as a novel vector of topological descriptors for amino acids and its application in QSARs of peptides , 2007 .

[8]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[9]  G. V. van Westen,et al.  Structure-Based Identification of OATP1B1/3 Inhibitors , 2013, Molecular Pharmacology.

[10]  M. L. Connolly Analytical molecular surface calculation , 1983 .

[11]  K. Fidelis,et al.  Generalized modeling of enzyme–ligand interactions using proteochemometrics and local protein substructures , 2006, Proteins.

[12]  T. Lundstedt,et al.  Proteochemometrics modeling of the interaction of amine G-protein coupled receptors with a diverse set of ligands. , 2002, Molecular pharmacology.

[13]  Andreas Bender,et al.  How Similar Are Similarity Searching Methods? A Principal Component Analysis of Molecular Descriptor Space , 2009, J. Chem. Inf. Model..

[14]  Peteris Prusis,et al.  A Look Inside HIV Resistance through Retroviral Protease Interaction Maps , 2007, PLoS Comput. Biol..

[15]  F. Tian,et al.  Quantitative Sequence-Activity Model (QSAM): Applying QSAR Strategy to Model and Predict Bioactivity and Function of Peptides, Proteins and Nucleic Acids , 2008 .

[16]  S Wold,et al.  Quantitative sequence-activity models (QSAM)--tools for sequence design. , 1993, Nucleic acids research.

[17]  K. Fidelis,et al.  Interaction Model Based on Local Protein Substructures Generalizes to the Entire Structural Enzyme‐Ligand Space. , 2009 .

[18]  Jean-Philippe Vert,et al.  Virtual screening of GPCRs: An in silico chemogenomics approach , 2008, BMC Bioinformatics.

[19]  Zihe Rao,et al.  Crystal structure of an avian influenza polymerase PAN reveals an endonuclease active site , 2009, Nature.

[20]  Gerard J. P. van Westen,et al.  Proteochemometric modeling as a tool to design selective compounds and for extrapolating to novel targets , 2011 .

[21]  Shengshi Z. Li,et al.  A new set of amino acid descriptors and its application in peptide QSARs. , 2005, Biopolymers.

[22]  Andrea Zaliani,et al.  MS-WHIM Scores for Amino Acids: A New 3D-Description for Peptide QSAR and QSPR Studies , 1999, J. Chem. Inf. Comput. Sci..

[23]  H. V. van Vlijmen,et al.  Which Compound to Select in Lead Optimization? Prospectively Validated Proteochemometric Models Guide Preclinical Development , 2011, PloS one.

[24]  Minoru Kanehisa,et al.  AAindex: Amino Acid index database , 2000, Nucleic Acids Res..

[25]  Alexander G. Georgiev,et al.  Interpretable Numerical Descriptors of Amino Acid Space , 2009, J. Comput. Biol..

[26]  Gerard J. P. van Westen,et al.  Significantly Improved HIV Inhibitor Efficacy Prediction Employing Proteochemometric Models Generated From Antivirogram Data , 2013, PLoS Comput. Biol..

[27]  Evi Kostenis,et al.  A physicogenetic method to assign ligand-binding relationships between 7TM receptors. , 2005, Bioorganic & medicinal chemistry letters.

[28]  Nathanael Weill,et al.  Alignment-Free Ultra-High-Throughput Comparison of Druggable Protein-Ligand Binding Sites , 2010, J. Chem. Inf. Model..

[29]  Didier Rognan,et al.  Protein-Ligand-Based Pharmacophores: Generation and Utility Assessment in Computational Ligand Profiling , 2012, J. Chem. Inf. Model..

[30]  P. Prusis,et al.  Melanocortin Receptors: Ligands and Proteochemometrics Modeling , 2003, Annals of the New York Academy of Sciences.

[31]  W. Dunn,et al.  Amino acid side chain descriptors for quantitative structure-activity relationship studies of peptide analogues. , 1995, Journal of medicinal chemistry.

[32]  Peteris Prusis,et al.  Proteochemometric modeling of HIV protease susceptibility , 2008, BMC Bioinformatics.

[33]  David A. Gough,et al.  Virtual Screen for Ligands of Orphan G Protein-Coupled Receptors , 2005, J. Chem. Inf. Model..

[34]  M. Shu,et al.  ST-scale as a novel amino acid descriptor and its application in QSAM of peptides and analogues , 2010, Amino Acids.

[35]  Zhiliang Li,et al.  Factor Analysis Scale of Generalized Amino Acid Information as the Source of a New Set of Descriptors for Elucidating the Structure and Activity Relationships of Cationic Antimicrobial Peptides , 2007 .

[36]  Peng Zhou,et al.  Gaussian process: an alternative approach for QSAM modeling of peptides , 2008, Amino Acids.

[37]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Nathanael Weill,et al.  Development and Validation of a Novel Protein-Ligand Fingerprint To Mine Chemogenomic Space: Application to G Protein-Coupled Receptors and Their Ligands , 2009, J. Chem. Inf. Model..

[39]  S. Wold,et al.  New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids. , 1998, Journal of medicinal chemistry.

[40]  John P. Overington,et al.  Chemogenomics approaches for receptor deorphanization and extensions of the chemogenomics concept to phenotypic space. , 2011, Current topics in medicinal chemistry.