TargetATPsite: A template‐free method for ATP‐binding sites prediction with residue evolution image sparse representation and classifier ensemble

Understanding the interactions between proteins and ligands is critical for protein function annotations and drug discovery. We report a new sequence‐based template‐free predictor (TargetATPsite) to identify the Adenosine‐5′‐triphosphate (ATP) binding sites with machine‐learning approaches. Two steps are implemented in TargetATPsite: binding residues and pockets predictions, respectively. To predict the binding residues, a novel image sparse representation technique is proposed to encode residue evolution information treated as the input features. An ensemble classifier constructed based on support vector machines (SVM) from multiple random under‐samplings is used as the prediction model, which is effective for dealing with imbalance phenomenon between the positive and negative training samples. Compared with the existing ATP‐specific sequence‐based predictors, TargetATPsite is featured by the second step of possessing the capability of further identifying the binding pockets from the predicted binding residues through a spatial clustering algorithm. Experimental results on three benchmark datasets demonstrate the efficacy of TargetATPsite. © 2013 Wiley Periodicals, Inc.

[1]  R. Wade,et al.  Computational approaches to identifying and characterizing protein binding sites for ligand design , 2009, Journal of molecular recognition : JMR.

[2]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[3]  Richard M. Jackson,et al.  Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites , 2005, Bioinform..

[4]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[5]  Lukasz Kurgan,et al.  ATPsite: sequence-based prediction of ATP-binding residues , 2011, Proteome Science.

[6]  Stefan Günther,et al.  SuperSite: dictionary of metabolite and drug binding sites in proteins , 2008, Nucleic Acids Res..

[7]  Ahmed H. Tewfik,et al.  Learning Sparse Representation Using Iterative Subspace Identification , 2010, IEEE Transactions on Signal Processing.

[8]  Michal Brylinski,et al.  FINDSITELHM: A Threading-Based Approach to Ligand Homology Modeling , 2009, PLoS Comput. Biol..

[9]  J. S. Sodhi,et al.  Predicting metal-binding site residues in low-resolution structural models. , 2004, Journal of molecular biology.

[10]  Jianjun Hu,et al.  HemeBIND: a novel method for heme binding residue prediction by combining structural and sequence information , 2011, BMC Bioinformatics.

[11]  A. Millar,et al.  Analysis of the soluble ATP-binding proteome of plant mitochondria identifies new proteins and nucleotide triphosphate interactions within the matrix. , 2006, Journal of proteome research.

[12]  Dario Ghersi,et al.  SITEHOUND-web: a server for ligand binding site identification in protein structures , 2009, Nucleic Acids Res..

[13]  Yan Huang,et al.  Predicting protein-ATP binding sites from primary sequence through fusing bi-profile sampling of multi-view features , 2012, BMC Bioinformatics.

[14]  Nobutaka Hirokawa,et al.  Biochemical and molecular characterization of diseases linked to motor proteins. , 2003, Trends in biochemical sciences.

[15]  Yang Zhang,et al.  Recognizing protein-ligand binding sites by global structural alignment and local geometry refinement. , 2012, Structure.

[16]  Guillermo Sapiro,et al.  Sparse Representation for Computer Vision and Pattern Recognition , 2010, Proceedings of the IEEE.

[17]  Galina L. Rogova,et al.  Combining the results of several neural network classifiers , 1994, Neural Networks.

[18]  Gajendra P. S. Raghava,et al.  Identification of ATP binding residues of a protein from its primary sequence , 2009, BMC Bioinformatics.

[19]  D. Levitt,et al.  POCKET: a computer graphics method for identifying and displaying protein cavities and their surrounding amino acids. , 1992, Journal of molecular graphics.

[20]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[21]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[22]  Jun Wang,et al.  L1pred: A Sequence-Based Prediction Tool for Catalytic Residues in Enzymes with the L1-logreg Classifier , 2012, PloS one.

[23]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[24]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[25]  Narayanan Eswar,et al.  Protein structure modeling with MODELLER. , 2008, Methods in molecular biology.

[26]  Jian Yang,et al.  From classifiers to discriminators: A nearest neighbor rule induced discriminant analysis , 2011, Pattern Recognit..

[27]  J. Thornton,et al.  A method for localizing ligand binding pockets in protein structures , 2005, Proteins.

[28]  N. Ben-Tal,et al.  ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. , 2001, Journal of molecular biology.

[29]  David Baker,et al.  Macromolecular modeling with rosetta. , 2008, Annual review of biochemistry.

[30]  J. Skolnick,et al.  FINDSITE‐metal: Integrating evolutionary information and machine learning for structure‐based metal‐binding site prediction at the proteome level , 2011, Proteins.

[31]  Gajendra P.S. Raghava,et al.  Prediction of RNA binding sites in a protein using SVM and PSSM profile , 2008, Proteins.

[32]  Nohad Gresh,et al.  Conformation‐dependent intermolecular interaction energies of the triphosphate anion with divalent metal cations. Application to the ATP‐binding site of a binuclear bacterial enzyme. A parallel quantum chemical and polarizable molecular mechanics investigation , 2004, J. Comput. Chem..

[33]  Mona Singh,et al.  Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure , 2009, PLoS Comput. Biol..

[34]  Vincent Le Guilloux,et al.  Fpocket: An open source platform for ligand pocket detection , 2009, BMC Bioinformatics.

[35]  M Hendlich,et al.  LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. , 1997, Journal of molecular graphics & modelling.

[36]  O. Schueler‐Furman,et al.  Conserved residue clustering and protein structure prediction , 2003, Proteins.

[37]  X. Barril,et al.  Understanding and predicting druggability. A high-throughput method for detection of drug binding sites. , 2010, Journal of medicinal chemistry.

[38]  Jeffrey Skolnick,et al.  The distribution of ligand-binding pockets around protein-protein interfaces suggests a general mechanism for pocket formation , 2012, Proceedings of the National Academy of Sciences.

[39]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[40]  Yuko Okamoto,et al.  Ab Initio prediction of protein–ligand binding structures by replica‐exchange umbrella sampling simulations , 2011, J. Comput. Chem..

[41]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[42]  Lukasz A. Kurgan,et al.  Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors , 2012, Bioinform..

[43]  Itay Mayrose,et al.  Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues , 2002, ISMB.

[44]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[45]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[46]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[47]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[48]  M. Schroeder,et al.  LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation , 2006, BMC Structural Biology.

[49]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[50]  R. Laskowski SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. , 1995, Journal of molecular graphics.

[51]  Rong Liu,et al.  Computational Prediction of Heme-Binding Residues by Exploiting Residue Interaction Network , 2011, PloS one.

[52]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[53]  V. Lenin,et al.  The United States of America , 2002, Government Statistical Agencies and the Politics of Credibility.

[54]  Tinku Acharya,et al.  Image Processing: Principles and Applications , 2005, J. Electronic Imaging.