Identification of protein-ligand binding site using multi-clustering and Support Vector Machine

Multi-clustering has been widely used. It acts as a pre-training process for identifying protein-ligand binding in structure-based drug design. Then, the Support Vector Machine (SVM) is employed to classify the sites most likely for binding ligands. Three types of attributes are used, namely geometry-based, energy-based, and sequence conservation. Comparison is made on 198 drug-target protein complexes with LIGSITECSC, SURFNET, Fpocket, Q-SiteFinder, ConCavity, and MetaPocket. The results show an improved success rate of up to 86%.

[1]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[2]  Mona Singh,et al.  Predicting functionally important residues from sequence conservation , 2007, Bioinform..

[3]  Yu Li,et al.  Identification of cavities on protein surface using multiple computational approaches for drug binding site prediction , 2011, Bioinform..

[4]  Song Liu,et al.  Protein binding site prediction using an empirical scoring function , 2006, Nucleic acids research.

[5]  M. Schroeder,et al.  LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation , 2006, BMC Structural Biology.

[6]  T. Kawabata Detection of multiscale pockets on protein surfaces using mathematical morphology , 2010, Proteins.

[7]  Vincent Le Guilloux,et al.  Fpocket: An open source platform for ligand pocket detection , 2009, BMC Bioinformatics.

[8]  Gunnar Rätsch,et al.  Support Vector Machines and Kernels for Computational Biology , 2008, PLoS Comput. Biol..

[9]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[10]  Yong Zhou,et al.  Roll: a new algorithm for the detection of protein pockets and cavities with a rolling probe sphere , 2010, Bioinform..

[11]  J. Thornton,et al.  PQS: a protein quaternary structure file server. , 1998, Trends in biochemical sciences.

[12]  R. Laskowski SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. , 1995, Journal of molecular graphics.

[13]  Kunbin Qu,et al.  Structure-Based Drug Design , 2007 .

[14]  Lynne Regan,et al.  Sequence variation in ligand binding sites in proteins , 2005, BMC Bioinformatics.

[15]  M Hendlich,et al.  LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. , 1997, Journal of molecular graphics & modelling.

[16]  D. Levitt,et al.  POCKET: a computer graphics method for identifying and displaying protein cavities and their surrounding amino acids. , 1992, Journal of molecular graphics.

[17]  Pieter F. W. Stouten,et al.  Fast prediction and visualization of protein binding pockets with PASS , 2000, J. Comput. Aided Mol. Des..

[18]  Peter Glöckner,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2013 .

[19]  Feiping Nie,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Multi-View K-Means Clustering on Big Data , 2022 .

[20]  I. Kuntz Structure-Based Strategies for Drug Design and Discovery , 1992, Science.

[21]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[22]  Bingding Huang,et al.  MetaPocket: a meta approach to improve protein ligand binding site prediction. , 2009, Omics : a journal of integrative biology.

[23]  Yi-Ping Phoebe Chen,et al.  Structure-based drug design to augment hit discovery. , 2011, Drug discovery today.

[24]  Yee Leung,et al.  Passage method for nonlinear dimensionality reduction of data on multi-cluster manifolds , 2013, Pattern Recognit..

[25]  Alasdair T. R. Laurie,et al.  Methods for the prediction of protein-ligand binding sites for structure-based drug design and virtual ligand screening. , 2006, Current protein & peptide science.

[26]  Kai Wang,et al.  Incorporating background frequency improves entropy-based residue conservation measures , 2006, BMC Bioinform..

[27]  Richard M. Jackson,et al.  Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites , 2005, Bioinform..

[28]  W. S. Valdar,et al.  Scoring residue conservation , 2002, Proteins.

[29]  R. Abagyan,et al.  Pocketome via Comprehensive Identification and Classification of Ligand Binding Envelopes* , 2005, Molecular & Cellular Proteomics.

[30]  Mona Singh,et al.  Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure , 2009, PLoS Comput. Biol..

[31]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..