Prediction of Drug Activity Using Molecular Fragments-Based Representation and RFE Support Vector Machine Algorithm

This paper describes the use of a support vector machine algorithm for the classification of molecules database in order for the prediction of the activity of drugs. Molecules database are fragmented, and each molecule is represented by a set of contained fragments. Molecular weighted descriptors are tested for the representation of molecular fragments in order to represent the dataset as a MxF array where each element takes the value of the molecular weighted descriptor calculated for the fragment. As weighted descriptors take into account distances and heteroatoms present in the fragments, the representation space allows the discrimination of similar structural fragments. A Support Vector Machine algorithm is used for the classification process for a training set. Prediction of the activity of the test set is carried out in function of results of training stage and the application of a proposed heuristic. Results obtained shows that the use of weighted molecular descriptors improves the prediction of drug activity for heterogeneous datasets.

[1]  Cesare Furlanello,et al.  An accelerated procedure for recursive feature ranking on microarray data , 2003, Neural Networks.

[2]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[3]  Denis M. Bayada,et al.  Molecular Diversity and Representativity in Chemical Databases , 1999, J. Chem. Inf. Comput. Sci..

[4]  Nicolás García-Pedrajas,et al.  A cooperative constructive method for neural networks for pattern recognition , 2007, Pattern Recognit..

[5]  A. N. Jain,et al.  Molecular hashkeys: a novel method for molecular characterization and its application for predicting important pharmaceutical properties of molecules. , 1999, Journal of medicinal chemistry.

[6]  Hongmao Sun,et al.  An Accurate and Interpretable Bayesian Classification Model for Prediction of hERG Liability , 2006, ChemMedChem.

[7]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[8]  Jun Zhou,et al.  Novel Potent Human Ether-à-Go-Go-Related Gene (hERG) Potassium Channel Enhancers and Their in Vitro Antiarrhythmic Activity , 2005, Molecular Pharmacology.

[9]  George Michailidis,et al.  The Ensemble Bridge Algorithm: A New Modeling Tool for Drug Discovery Problems , 2010, J. Chem. Inf. Model..

[10]  Yi-ping Wang,et al.  State-dependent blockade of human ether-a-go-go-related gene (hERG) K+ channels by changrolin in stably transfected HEK293 cells , 2010, Acta Pharmacologica Sinica.

[11]  Irene Luque Ruiz,et al.  Representation of the Molecular Topology of Cyclical Structures by Means of Cycle Graphs. 3. Hierarchical Model of Screening of Chemical Databases , 2004, J. Chem. Inf. Model..

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[14]  Paolo Benedetti,et al.  FLAP: GRID Molecular Interaction Fields in Virtual Screening. Validation using the DUD Data Set , 2010, J. Chem. Inf. Model..