Graph Kernels for Molecular Structure-Activity Relationship Analysis with Support Vector Machines

The support vector machine algorithm together with graph kernel functions has recently been introduced to model structure-activity relationships (SAR) of molecules from their 2D structure, without the need for explicit molecular descriptor computation. We propose two extensions to this approach with the double goal to reduce the computational burden associated with the model and to enhance its predictive accuracy: description of the molecules by a Morgan index process and definition of a second-order Markov model for random walks on 2D structures. Experiments on two mutagenicity data sets validate the proposed extensions, making this approach a possible complementary alternative to other modeling strategies.

[1]  Gunnar Rätsch,et al.  Active Learning with Support Vector Machines in the Drug Discovery Process , 2003, J. Chem. Inf. Comput. Sci..

[2]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[3]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[4]  M J Sternberg,et al.  Structure-activity relationships derived by machine learning: the use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Gerta Rücker,et al.  Counts of all walks as atomic and molecular descriptors , 1993, J. Chem. Inf. Comput. Sci..

[6]  H. L. Morgan The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. , 1965 .

[7]  Luc De Raedt,et al.  Feature Construction with Version Spaces for Biochemical Applications , 2001, ICML.

[8]  Luc De Raedt,et al.  Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds , 2004, J. Chem. Inf. Model..

[9]  Pierre Baldi,et al.  Graph kernels for chemical informatics , 2005, Neural Networks.

[10]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[11]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[12]  Michèle Sebag,et al.  Tractable Induction and Classification in First Order Logic Via Stochastic Matching , 1997, IJCAI.

[13]  A. Debnath,et al.  Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. , 1991, Journal of medicinal chemistry.

[14]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[15]  J. Gasteiger,et al.  Chemoinformatics: A Textbook , 2003 .

[16]  Tatsuya Akutsu,et al.  Extensions of marginalized graph kernels , 2004, ICML.

[17]  Kiyoshi Asai,et al.  Marginalized kernels for biological sequences , 2002, ISMB.

[18]  Michael Gribskov,et al.  Use of Receiver Operating Characteristic (ROC) Analysis to Evaluate Sequence Matching , 1996, Comput. Chem..

[19]  Thomas Gärtner,et al.  A survey of kernels for structured data , 2003, SKDD.

[20]  Bernard F. Buxton,et al.  Drug Design by Machine Learning: Support Vector Machines for Pharmaceutical Data Analysis , 2001, Comput. Chem..

[21]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.