Predicting protein-ligand binding site with differential evolution and support vector machine

Identification of protein-ligand binding site is an important task in structure-based drug design and docking algorithms. In these two decades, many different approaches have been developed to predict the binding site, such as geometric, energetic and sequence-based methods. When the scores are calculated from these methods, the method of classification is very important and can affect the prediction results greatly. A developed support vector machine (SVM) is used to classify the pockets, which are most likely to bind ligands with the attributes of grid value, interaction potential, offset from protein, conservation score and the information around the pockets. Since SVM is sensitive to the input parameters and the positive samples are more relevant than negative samples, differential evolution (DE) is applied to find out the suitable parameters for SVM. We compare our algorithm to four other approaches: LIGSITE, SURFNET, PocketFinder and Concavity. Our algorithm is found to provide the highest success rate.

[1]  R. Thomsen Flexible ligand docking using differential evolution , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[2]  M. Schroeder,et al.  LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation , 2006, BMC Structural Biology.

[3]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[4]  Richard M. Jackson,et al.  Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites , 2005, Bioinform..

[5]  D. Levitt,et al.  POCKET: a computer graphics method for identifying and displaying protein cavities and their surrounding amino acids. , 1992, Journal of molecular graphics.

[6]  Jean-Philippe Vert Kernel Methods in Genomics and Computational Biology , 2005, q-bio/0510032.

[7]  F. H. Frank Leung,et al.  Predicting protein-ligand binding site with support vector machine , 2010, IEEE Congress on Evolutionary Computation.

[8]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory, Second Edition , 2000, Statistics for Engineering and Information Science.

[9]  Gail J. Bartlett,et al.  Using a neural network and spatial clustering to predict the location of active sites in enzymes. , 2003, Journal of molecular biology.

[10]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[11]  Jean-Philippe Vert Kernel Methods in Genomics and Computational Biology , 2005 .

[12]  Gunnar Rätsch,et al.  Support Vector Machines and Kernels for Computational Biology , 2008, PLoS Comput. Biol..

[13]  J. Thornton,et al.  PQS: a protein quaternary structure file server. , 1998, Trends in biochemical sciences.

[14]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[15]  Mona Singh,et al.  Predicting functionally important residues from sequence conservation , 2007, Bioinform..

[16]  Song Liu,et al.  Protein binding site prediction using an empirical scoring function , 2006, Nucleic acids research.

[17]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[18]  Alasdair T. R. Laurie,et al.  Methods for the prediction of protein-ligand binding sites for structure-based drug design and virtual ligand screening. , 2006, Current protein & peptide science.

[19]  Kai Wang,et al.  Incorporating background frequency improves entropy-based residue conservation measures , 2006, BMC Bioinform..

[20]  R. Abagyan,et al.  Pocketome via Comprehensive Identification and Classification of Ligand Binding Envelopes* , 2005, Molecular & Cellular Proteomics.

[21]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[22]  Lynne Regan,et al.  Sequence variation in ligand binding sites in proteins , 2005, BMC Bioinformatics.

[23]  M Hendlich,et al.  LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. , 1997, Journal of molecular graphics & modelling.

[24]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[25]  Shoshana J. Wodak,et al.  LigASite—a database of biologically relevant binding sites in proteins with known apo-structures , 2007, Nucleic Acids Res..

[26]  Mona Singh,et al.  Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure , 2009, PLoS Comput. Biol..

[27]  W. S. Valdar,et al.  Scoring residue conservation , 2002, Proteins.

[28]  René Thomsen,et al.  MolDock: a new technique for high-accuracy molecular docking. , 2006, Journal of medicinal chemistry.

[29]  R. Laskowski SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. , 1995, Journal of molecular graphics.

[30]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[31]  R. Storn,et al.  Differential Evolution: A Practical Approach to Global Optimization (Natural Computing Series) , 2005 .

[32]  Cathy H. Wu,et al.  Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties , 2006, BMC Bioinformatics.

[33]  René Thomsen,et al.  A comparative study of differential evolution, particle swarm optimization, and evolutionary algorithms on numerical benchmark problems , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).