Probabilistic Modeling of Conformational Space for 3D Machine Learning Approaches

We present a new probabilistic encoding of the conformational space of a molecule that allows for the integration into common similarity calculations. The method uses distance profiles of flexible atom‐pairs and computes generative models that describe the distance distribution in the conformational space. The generative models permit the use of probabilistic kernel functions and, therefore, our approach can be used to extend existing 3D molecular kernel functions, as applied in support vector machines, to build QSAR models. The resulting kernels are valid 4D kernel functions and reduce the dependency of the model quality on suitable conformations of the molecules. We showed in several experiments the robust performance of the 4D kernel function, which was extended by our approach, in comparison to the original 3D‐based kernel function. The new method compares the conformational space of two molecules within one kernel evaluation. Hence, the number of kernel evaluations is significantly reduced in comparison to common kernel‐based conformational space averaging techniques. Additionally, the performance gain of the extended model correlates with the flexibility of the data set and enables an a priori estimation of the model improvement.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Jean-Philippe Vert,et al.  The Pharmacophore Kernel for Virtual Screening with Support Vector Machines , 2006, J. Chem. Inf. Model..

[3]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[4]  Ferenc Fülöp,et al.  Ligand-based prediction of active conformation by 3D-QSAR flexibility descriptors and their application in 3+3D-QSAR models. , 2005, Journal of medicinal chemistry.

[5]  Andreas Zell,et al.  Optimal assignment kernels for attributed molecular graphs , 2005, ICML.

[6]  A. Hopfinger,et al.  Construction of 3D-QSAR Models Using the 4D-QSAR Analysis Formalism , 1997 .

[7]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[8]  Martin Saunders,et al.  Conformations of cycloheptadecane. A comparison of methods for conformational searching , 1990 .

[9]  G. Celeux,et al.  An entropy criterion for assessing the number of clusters in a mixture model , 1996 .

[10]  J. Gasteiger,et al.  Automatic generation of 3D-atomic coordinates for organic molecules , 1990 .

[11]  N. Stiefl,et al.  Mapping property distributions of molecular surfaces: algorithm and evaluation of a novel 3D quantitative structure-activity relationship technique. , 2003, Journal of medicinal chemistry.

[12]  Pierre Baldi,et al.  One- to Four-Dimensional Kernels for Virtual Screening and the Prediction of Physical, Chemical, and Biological Properties , 2007, J. Chem. Inf. Model..

[13]  Yoshiaki Nakagawa,et al.  Classical and three-dimensional QSAR for the inhibition of [3H]ponasterone A binding by diacylhydrazine-type ecdysone agonists to insect Sf-9 cells. , 2005, Bioorganic & medicinal chemistry.

[14]  Michael J. Sorich,et al.  Comparison Data Sets for Benchmarking QSAR Methodologies in Lead Optimization , 2009, J. Chem. Inf. Model..

[15]  Thierry Langer,et al.  Comparative Performance Assessment of the Conformational Model Generators Omega and Catalyst: A Large-Scale Survey on the Retrieval of Protein-Bound Ligand Conformations , 2006, J. Chem. Inf. Model..

[16]  Nicolas Foloppe,et al.  Conformational Sampling of Druglike Molecules with MOE and Catalyst: Implications for Pharmacophore Modeling and Virtual Screening , 2008, J. Chem. Inf. Model..

[17]  H. Akaike A new look at the statistical model identification , 1974 .

[18]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[19]  C. Steinbeck,et al.  Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. , 2006, Current pharmaceutical design.

[20]  Gisbert Schneider,et al.  Kernel Approach to Molecular Similarity Based on Iterative Graph Similarity , 2007, J. Chem. Inf. Model..

[21]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[22]  Máté Dervarics,et al.  Development of a Chirality-Sensitive Flexibility Descriptor for 3+3D-QSAR , 2006, J. Chem. Inf. Model..

[23]  R. Cramer,et al.  Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. , 1988, Journal of the American Chemical Society.

[24]  S. Pickett,et al.  GRid-INdependent descriptors (GRIND): a novel class of alignment-independent three-dimensional molecular descriptors. , 2000, Journal of medicinal chemistry.

[25]  G. Chang,et al.  An internal-coordinate Monte Carlo method for searching conformational space , 1989 .

[26]  Schmid,et al.  "Scaffold-Hopping" by Topological Pharmacophore Search: A Contribution to Virtual Screening. , 1999, Angewandte Chemie.

[27]  P. Goodford A computational procedure for determining energetically favorable binding sites on biologically important macromolecules. , 1985, Journal of medicinal chemistry.

[28]  Edward Fredkin,et al.  Trie memory , 1960, Commun. ACM.

[29]  Pierre Baldi,et al.  Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity , 2005, ISMB.

[30]  Andreas Zell,et al.  Atomic Local Neighborhood Flexibility Incorporation into a Structured Similarity Measure for QSAR , 2009, J. Chem. Inf. Model..

[31]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[32]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[33]  A. Zell,et al.  Assignment kernels for chemical compounds , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[34]  Andreas Zell,et al.  Optimal assignment methods for ligand-based virtual screening , 2009, J. Cheminformatics.

[35]  K. Gabriel,et al.  On closed testing procedures with special reference to ordered analysis of variance , 1976 .

[36]  R. Clark Prospective ligand- and target-based 3D QSAR: state of the art 2008. , 2009, Current topics in medicinal chemistry.

[37]  P. Deb Finite Mixture Models , 2008 .

[38]  Miki Akamatsu,et al.  Current state and perspectives of 3D-QSAR. , 2002, Current topics in medicinal chemistry.

[39]  Benjamin Georgi,et al.  Context-specific independence mixture modeling for positional weight matrices , 2006, ISMB.

[40]  Anton J. Hopfinger,et al.  The 4D-QSAR Paradigm: Application to a Novel Set of Non-peptidic HIV Protease Inhibitors , 2002 .

[41]  Andreas Zell,et al.  Kernel Functions for Attributed Molecular Graphs – A New Similarity‐Based Approach to ADME Prediction in Classification and Regression , 2006 .

[42]  Tudor I. Oprea,et al.  Efficient Calculation of Molecular Properties from Simulation Using Kernel Molecular Dynamics , 2008, J. Chem. Inf. Model..