Optimal assignment kernels for attributed molecular graphs

We propose a new kernel function for attributed molecular graphs, which is based on the idea of computing an optimal assignment from the atoms of one molecule to those of another one, including information on neighborhood, membership to a certain structural element and other characteristics for each atom. As a byproduct this leads to a new class of kernel functions. We demonstrate how the necessary computations can be carried out efficiently. Compared to marginalized graph kernels our method in some cases leads to a significant reduction of the prediction error. Further improvement can be gained, if expert knowledge is combined with our method. We also investigate a reduced graph representation of molecules by collapsing certain structural elements, like e.g. rings, into a single node of the molecular graph.

[1]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[2]  H. van de Waterbeemd,et al.  ADMET in silico modelling: towards prediction paradise? , 2003, Nature reviews. Drug discovery.

[3]  Ashwin Srinivasan,et al.  The Predictive Toxicology Challenge 2000-2001 , 2001, Bioinform..

[4]  Johann Gasteiger,et al.  A new model for calculating atomic charges in molecules , 1978 .

[5]  Tudor I. Oprea,et al.  Pharmacokinetically based mapping device for chemical space navigation. , 2002, Journal of combinatorial chemistry.

[6]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[7]  Takashi Washio,et al.  State of the art of graph-based data mining , 2003, SKDD.

[8]  H. Kubinyi Drug research: myths, hype and reality , 2003, Nature Reviews Drug Discovery.

[9]  Gerhard Klebe,et al.  Development of new hydrogen-bond descriptors and their application to comparative molecular field analyses. , 2002, Journal of medicinal chemistry.

[10]  Andreas Zell,et al.  Feature Selection for Descriptor Based Classification Models. 2. Human Intestinal Absorption (HIA) , 2004, J. Chem. Inf. Model..

[11]  M. Feher,et al.  A simple model for the prediction of blood-brain partitioning. , 2000, International journal of pharmaceutics.

[12]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[13]  Luc De Raedt,et al.  Feature Construction with Version Spaces for Biochemical Applications , 2001, ICML.

[14]  J. Topliss,et al.  QSAR model for drug human oral bioavailability. , 2000, Journal of medicinal chemistry.

[15]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[16]  Xin Chen,et al.  Automated Pharmacophore Identification for Large Chemical Data Sets1 , 1999, J. Chem. Inf. Comput. Sci..

[17]  Kurt Mehlhorn,et al.  The LEDA Platform of Combinatorial and Geometric Computing , 1997, ICALP.

[18]  John G. Topliss,et al.  QSAR Model for Drug Human Oral Bioavailability1 , 2000 .

[19]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[20]  S. Stanley Young,et al.  Automated Pharmacophore Identification for Large Chemical Data Sets. , 1999 .

[21]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[22]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[23]  John Figueras,et al.  Ring Perception Using Breadth-First Search , 1996, J. Chem. Inf. Comput. Sci..