Exploring the potential of 3D Zernike descriptors and SVM for protein–protein interface prediction

BackgroundThe correct determination of protein–protein interaction interfaces is important for understanding disease mechanisms and for rational drug design. To date, several computational methods for the prediction of protein interfaces have been developed, but the interface prediction problem is still not fully understood. Experimental evidence suggests that the location of binding sites is imprinted in the protein structure, but there are major differences among the interfaces of the various protein types: the characterising properties can vary a lot depending on the interaction type and function. The selection of an optimal set of features characterising the protein interface and the development of an effective method to represent and capture the complex protein recognition patterns are of paramount importance for this task.ResultsIn this work we investigate the potential of a novel local surface descriptor based on 3D Zernike moments for the interface prediction task. Descriptors invariant to roto-translations are extracted from circular patches of the protein surface enriched with physico-chemical properties from the HQI8 amino acid index set, and are used as samples for a binary classification problem. Support Vector Machines are used as a classifier to distinguish interface local surface patches from non-interface ones. The proposed method was validated on 16 classes of proteins extracted from the Protein–Protein Docking Benchmark 5.0 and compared to other state-of-the-art protein interface predictors (SPPIDER, PrISE and NPS-HomPPI).ConclusionsThe 3D Zernike descriptors are able to capture the similarity among patterns of physico-chemical and biochemical properties mapped on the protein surface arising from the various spatial arrangements of the underlying residues, and their usage can be easily extended to other sets of amino acid properties. The results suggest that the choice of a proper set of features characterising the protein interface is crucial for the interface prediction task, and that optimality strongly depends on the class of proteins whose interface we want to characterise. We postulate that different protein classes should be treated separately and that it is necessary to identify an optimal set of features for each protein class.

[1]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[2]  Vasant Honavar,et al.  Template-based protein–protein docking exploiting pairwise interfacial residue restraints , 2016, Briefings Bioinform..

[3]  P. Aloy,et al.  Relation between amino acid composition and cellular location of proteins. , 1997, Journal of molecular biology.

[4]  Jihong Guan,et al.  PredUs: a web server for predicting protein interfaces using structural neighbors , 2011, Nucleic Acids Res..

[5]  Yanay Ofran,et al.  Paratome: an online tool for systematic identification of antigen-binding regions in antibodies based on sequence or structure , 2012, Nucleic Acids Res..

[6]  Steven E. Brenner,et al.  SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures , 2013, Nucleic Acids Res..

[7]  Carlo Ferrari,et al.  Computing voxelised representations of macromolecular surfaces , 2018, Int. J. High Perform. Comput. Appl..

[8]  Yigong Shi A Glimpse of Structural Biology through X-Ray Crystallography , 2014, Cell.

[9]  Kristian Vlahovicek,et al.  Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests , 2009, PLoS Comput. Biol..

[10]  Aleksey A. Porollo,et al.  Prediction‐based fingerprints of protein–protein interactions , 2006, Proteins.

[11]  C. Chothia,et al.  The Packing Density in Proteins: Standard Radii and Volumes , 1999 .

[12]  Denise Gorse,et al.  Morphological aspects of oligomeric protein structures. , 2005, Progress in biophysics and molecular biology.

[13]  Michal Brylinski,et al.  Predicting protein interface residues using easily accessible on-line resources , 2015, Briefings Bioinform..

[14]  Ruth Nussinov,et al.  From computer vision to protein structure and association , 1998 .

[15]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[16]  Guodong Chen,et al.  Mapping the Energetic Epitope of an Antibody/Interleukin-23 Interaction with Hydrogen/Deuterium Exchange, Fast Photochemical Oxidation of Proteins Mass Spectrometry, and Alanine Shave Mutagenesis. , 2017, Analytical chemistry.

[17]  R. Nussinov,et al.  Protein–protein interactions: Structurally conserved residues distinguish between binding sites and exposed protein surfaces , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[19]  Gisele L. Pappa,et al.  GASS: identifying enzyme active sites with genetic algorithms , 2015, Bioinform..

[20]  Giovanna Zinzalla,et al.  Targeting protein-protein interactions for therapeutic intervention: a challenge for the future. , 2009, Future medicinal chemistry.

[21]  Ruben Abagyan,et al.  PIER: Protein interface recognition for structural proteomics , 2007, Proteins.

[22]  C. Deane,et al.  i‐Patch: Interprotein contact prediction using local network information , 2010, Proteins.

[23]  Pedro Alexandrino Fernandes,et al.  New Parameters for Higher Accuracy in the Computation of Binding Free Energy Differences upon Alanine Scanning Mutagenesis on Protein-Protein Interfaces , 2017, J. Chem. Inf. Model..

[24]  Bostjan Kobe,et al.  Crystallography and protein-protein interactions: biological interfaces and crystal contacts. , 2008, Biochemical Society transactions.

[25]  O. Keskin,et al.  Transient protein-protein interactions. , 2011, Protein engineering, design & selection : PEDS.

[26]  Vasant Honavar,et al.  HomPPI: a class of sequence homology based protein-protein interface prediction methods , 2011, BMC Bioinformatics.

[27]  Xue-wen Chen,et al.  Sequence-based prediction of protein interaction sites with an integrative method , 2009, Bioinform..

[28]  Sarah A. Teichmann,et al.  Principles of protein-protein interactions , 2002, ECCB.

[29]  Sara Linse,et al.  Methods for the detection and analysis of protein–protein interactions , 2007, Proteomics.

[30]  D. Kihara,et al.  Detecting local ligand‐binding site similarity in nonhomologous proteins by surface patch comparison , 2012, Proteins.

[31]  J. Gibrat,et al.  Secondary structure prediction: combination of three different methods. , 1988, Protein engineering.

[32]  Ke Ma,et al.  Molecular Pathways: Cbl Proteins in Tumorigenesis and Antitumor Immunity—Opportunities for Cancer Treatment , 2014, Clinical Cancer Research.

[33]  Fei Tony Liu,et al.  Isolation-Based Anomaly Detection , 2012, TKDD.

[34]  Ujjwal Maulik,et al.  Fuzzy clustering of physicochemical and biochemical properties of amino Acids , 2011, Amino Acids.

[35]  B. Rost,et al.  Analysing six types of protein-protein interfaces. , 2003, Journal of molecular biology.

[36]  Shneior Lifson,et al.  Antiparallel and parallel β-strands differ in amino acid residue preferences , 1979, Nature.

[37]  Bin Li,et al.  Protein docking prediction using predicted protein-protein interface , 2012, BMC Bioinformatics.

[38]  Vasant Honavar,et al.  DockRank: Ranking docked conformations using partner‐specific sequence homology‐based protein interface prediction , 2014, Proteins.

[39]  E. Guney,et al.  iFrag: A Protein-Protein Interface Prediction Server Based on Sequence Fragments. , 2017, Journal of molecular biology.

[40]  Junfeng Xia,et al.  Prediction of protein–protein interaction sites by means of ensemble learning and weighted feature descriptor , 2016, Journal of Biological Research-Thessaloniki.

[41]  Yiwen Wu,et al.  Amyloid precursor protein-mediated endocytic pathway disruption induces axonal dysfunction and neurodegeneration. , 2016, The Journal of clinical investigation.

[42]  Benjamin A. Shoemaker,et al.  Inferred Biomolecular Interaction Server—a web server to analyze and predict protein interacting partners and binding sites , 2009, Nucleic Acids Res..

[43]  Adel Golovin,et al.  Cation–π interactions in protein–protein interfaces , 2005 .

[44]  Georgios A. Dalkas,et al.  SEPIa, a knowledge-driven algorithm for predicting conformational B-cell epitopes from the amino acid sequence , 2017, BMC Bioinformatics.

[45]  Irina S. Moreira,et al.  A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces , 2016, International journal of molecular sciences.

[46]  Raphael A. G. Chaleil,et al.  Updates to the Integrated Protein-Protein Interaction Benchmarks: Docking Benchmark Version 5 and Affinity Benchmark Version 2. , 2015, Journal of molecular biology.

[47]  Daisuke Kihara,et al.  Characterization and Classification of Local Protein Surfaces Using Self-Organizing Map , 2010, Int. J. Knowl. Discov. Bioinform..

[48]  J. Janin,et al.  Dissecting subunit interfaces in homodimeric proteins , 2003, Proteins.

[49]  Daisuke Kihara,et al.  Protein-protein docking using region-based 3D Zernike descriptors , 2009, BMC Bioinformatics.

[50]  J. Thornton,et al.  Diversity of protein–protein interactions , 2003, The EMBO journal.

[51]  Bjoern Peters,et al.  BepiPred-2.0: improving sequence-based B-cell epitope prediction using conformational epitopes , 2017, Nucleic Acids Res..

[52]  Nurit Haspel,et al.  Methods for Detecting Critical Residues in Proteins. , 2017, Methods in molecular biology.

[53]  Carlo Ferrari,et al.  Computing Discrete Fine-Grained Representations of Protein Surfaces , 2015, CIBB.

[54]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[55]  Ujjwal Maulik,et al.  PPIcons: identification of protein-protein interaction sites in selected organisms , 2013, Journal of Molecular Modeling.

[56]  Vasant G Honavar,et al.  Computational prediction of protein interfaces: A review of data driven methods , 2015, FEBS letters.

[57]  C. Deane,et al.  Antibody i-Patch prediction of the antibody binding site improves rigid local antibody-antigen docking. , 2013, Protein engineering, design & selection : PEDS.

[58]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[59]  Ruth Nussinov,et al.  Taking geometry to its edge: Fast unbound rigid (and hinge‐bent) docking , 2003, Proteins.

[60]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[61]  Juan D. Chavez,et al.  Mitochondrial protein interactome elucidated by chemical cross-linking mass spectrometry , 2017, Proceedings of the National Academy of Sciences.

[62]  Dexing Zhong,et al.  CarSPred: A Computational Tool for Predicting Carbonylation Sites of Human Proteins , 2014, PloS one.

[63]  H. Scheraga,et al.  Status of empirical methods for the prediction of protein backbone topography. , 1976, Biochemistry.

[64]  Joel P Mackay,et al.  The structural analysis of protein–protein interactions by NMR spectroscopy , 2009, Proteomics.

[65]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[66]  Bin Li,et al.  Fast protein tertiary structure retrieval based on global surface shape similarity , 2008, Proteins.

[67]  Vasant Honavar,et al.  A two-stage classifier for identification of protein-protein interface residues , 2004, ISMB/ECCB.

[68]  Doheon Lee,et al.  Specificity of molecular interactions in transient protein–protein interaction interfaces , 2006, Proteins.

[69]  Vasant Honavar,et al.  Characterization of Protein–Protein Interfaces , 2008, The protein journal.

[70]  R. Nussinov,et al.  Principles of protein-protein interactions: what are the preferred ways for proteins to interact? , 2008, Chemical reviews.

[71]  Z. Weng,et al.  Structure, function, and evolution of transient and obligate protein-protein interactions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[72]  Reinhard Klein,et al.  Shape retrieval using 3D Zernike descriptors , 2004, Comput. Aided Des..

[73]  Bernhard Schölkopf,et al.  Extracting Support Data for a Given Task , 1995, KDD.

[74]  David Haussler,et al.  Proceedings of the fifth annual workshop on Computational learning theory , 1992, COLT 1992.

[75]  Daisuke Kihara,et al.  Binding Ligand Prediction for Proteins Using Partial Matching of Local Surface Patches , 2010, International journal of molecular sciences.

[76]  Jiangning Song,et al.  SOHPRED: a new bioinformatics tool for the characterization and prediction of human S-sulfenylation sites. , 2016, Molecular bioSystems.

[77]  Robertson Craig,et al.  Open source system for analyzing, validating, and storing protein identification data. , 2004, Journal of proteome research.

[78]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[79]  A. Ben-Hur,et al.  PAIRpred: Partner‐specific prediction of interacting residues from sequence and structure , 2014, Proteins.

[80]  K. Mizuguchi,et al.  Partner-Aware Prediction of Interacting Residues in Protein-Protein Complexes from Sequence Data , 2011, PloS one.

[81]  Daisuke Kihara,et al.  PatchSurfers: Two methods for local molecular property-based binding ligand prediction. , 2016, Methods.

[82]  M. L. Connolly Analytical molecular surface calculation , 1983 .

[83]  E. Callaway The revolution will not be crystallized: a new method sweeps through structural biology , 2015, Nature.

[84]  Kenji Mizuguchi,et al.  Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites , 2010, Bioinform..

[85]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[86]  U. Maulik,et al.  Protein–Protein interaction site prediction in Homo sapiens and E. coli using an interaction-affinity based membership function in fuzzy SVM , 2015, Journal of Biosciences.

[87]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[88]  A. Giuliani,et al.  A computational approach identifies two regions of Hepatitis C Virus E1 protein as interacting domains involved in viral fusion process , 2009, BMC Structural Biology.

[89]  Vasant Honavar,et al.  Predicting protein-protein interface residues using local surface structural similarity , 2012, BMC Bioinformatics.

[90]  Eric W. Deutsch,et al.  The PeptideAtlas project , 2005, Nucleic Acids Res..

[91]  Hamid D. Ismail,et al.  RF-Hydroxysite: a random forest based predictor for hydroxylation sites. , 2016, Molecular bioSystems.

[92]  Jean-Christophe Nebel,et al.  Progress and challenges in predicting protein interfaces , 2015, Briefings Bioinform..

[93]  R. Jernigan,et al.  Self‐consistent estimation of inter‐residue protein contact energies based on an equilibrium mixture approximation of residues , 1999, Proteins.

[94]  Daisuke Kihara,et al.  Potential for Protein Surface Shape Analysis Using Spherical Harmonics and 3D Zernike Descriptors , 2009, Cell Biochemistry and Biophysics.

[95]  Daisuke Kihara,et al.  PL-PatchSurfer: A Novel Molecular Local Surface-Based Method for Exploring Protein-Ligand Interactions , 2014, International journal of molecular sciences.

[96]  K Nishikawa,et al.  The amino acid composition is different between the cytoplasmic and extracellular sides in membrane proteins , 1992, FEBS letters.

[97]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[98]  Johannes Griss,et al.  The Proteomics Identifications (PRIDE) database and associated tools: status in 2013 , 2012, Nucleic Acids Res..

[99]  J. Meller,et al.  Computational Methods for Prediction of Protein-Protein Interaction Sites , 2012 .

[100]  Daisuke Kihara,et al.  Large-scale binding ligand prediction by improved patch-based method Patch-Surfer2.0 , 2015, Bioinform..

[101]  Nikolaos Canterakis,et al.  3D Zernike Moments and Zernike Affine Invariants for 3D Image Analysis and Recognition , 1999 .

[102]  Yang Jiang,et al.  Prediction of active sites of enzymes by maximum relevance minimum redundancy (mRMR) feature selection. , 2013, Molecular bioSystems.

[103]  R. Raz,et al.  ProMate: a structure based prediction program to identify the location of protein-protein binding sites. , 2004, Journal of molecular biology.

[104]  O. Keskin,et al.  Predicting Protein-Protein Interactions from the Molecular to the Proteome Level. , 2016, Chemical reviews.

[105]  B. Matthews,et al.  Structural basis of amino acid alpha helix propensity. , 1993, Science.

[106]  B. Li,et al.  Rapid comparison of properties on protein surface , 2008, Proteins.

[107]  M. Rask-Andersen,et al.  Trends in the exploitation of novel drug targets , 2011, Nature Reviews Drug Discovery.

[108]  Ruth Nussinov,et al.  Efficient Unbound Docking of Rigid Molecules , 2002, WABI.