Molecular Similarity Searching Using Atom Environments, Information-Based Feature Selection, and a Naïve Bayesian Classifier

A novel technique for similarity searching is introduced. Molecules are represented by atom environments, which are fed into an information-gain-based feature selection. A naïve Bayesian classifier is then employed for compound classification. The new method is tested by its ability to retrieve five sets of active molecules seeded in the MDL Drug Data Report (MDDR). In comparison experiments, the algorithm outperforms all current retrieval methods assessed here using two- and three-dimensional descriptors and offers insight into the significance of structural components for binding.

[1]  P. Willett,et al.  A Comparison of Some Measures for the Determination of Inter‐Molecular Structural Similarity Measures of Inter‐Molecular Structural Similarity , 1986 .

[2]  D W Cushman,et al.  Design of potent competitive inhibitors of angiotensin-converting enzyme. Carboxyalkanoyl and mercaptoalkanoyl amino acids. , 1977, Biochemistry.

[3]  Andreas Zell,et al.  Prediction of Aqueous Solubility and Partition Coefficient Optimized by a Genetic Algorithm Based Descriptor Selection Method , 2003, J. Chem. Inf. Comput. Sci..

[4]  Robert D Clark,et al.  Neighborhood behavior: a useful concept for validation of "molecular diversity" descriptors. , 1996, Journal of medicinal chemistry.

[5]  R. Cramer,et al.  Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. , 1988, Journal of the American Chemical Society.

[6]  Robert C. Glen,et al.  Novel Methods for the Prediction of logP, pKa, and logD , 2002, J. Chem. Inf. Comput. Sci..

[7]  Y. Martin,et al.  Do structurally similar molecules have similar biological activity? , 2002, Journal of medicinal chemistry.

[8]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[9]  Mark A. Murcko,et al.  Virtual screening : an overview , 1998 .

[10]  Robert C. Glen,et al.  Applications of rule-induction in the derivation of quantitative structure-activity relationships , 1992, J. Comput. Aided Mol. Des..

[11]  E Uriarte,et al.  Recent advances on the role of topological indices in drug discovery research. , 2001, Current medicinal chemistry.

[12]  U. Lessel,et al.  In vitro and in silico affinity fingerprints: Finding similarities beyond structural classes , 2000 .

[13]  Jean-Loup Faulon,et al.  Developing a methodology for an inverse quantitative structure-activity relationship using the signature molecular descriptor. , 2002, Journal of molecular graphics & modelling.

[14]  R. Natesh,et al.  Crystal structure of the human angiotensin-converting enzyme–lisinopril complex , 2003, Nature.

[15]  Matthias Rarey,et al.  Feature trees: A new molecular similarity measure based on tree matching , 1998, J. Comput. Aided Mol. Des..

[16]  I. Kuntz,et al.  Molecular similarity based on DOCK-generated fingerprints. , 1996, Journal of medicinal chemistry.

[17]  Peter Willett,et al.  Similarity Searching and Clustering of Chemical-Structure Databases Using Molecular Property Data , 1994, J. Chem. Inf. Comput. Sci..

[18]  C. Lemmen,et al.  FLEXS: a method for fast flexible ligand superposition. , 1998, Journal of medicinal chemistry.

[19]  Jean-Loup Faulon,et al.  The Signature Molecular Descriptor. 2. Enumerating Molecules from Their Extended Valence Sequences , 2003, J. Chem. Inf. Comput. Sci..

[20]  Jean-Loup Faulon,et al.  The Signature Molecular Descriptor. 1. Using Extended Valence Sequences in QSAR and QSPR Studies , 2003, J. Chem. Inf. Comput. Sci..

[21]  A. Good,et al.  3-D pharmacophores in drug discovery. , 2001, Current pharmaceutical design.

[22]  Hans Briem,et al.  Flexsim-X: A Method for the Detection of Molecules with Similar Biological Activity , 2000, J. Chem. Inf. Comput. Sci..

[23]  R. Cramer,et al.  Validation of the general purpose tripos 5.2 force field , 1989 .

[24]  P Willett,et al.  Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings. , 2002, Combinatorial chemistry & high throughput screening.

[25]  Jean-Loup Faulon,et al.  Stochastic Generator of Chemical Structure. 1. Application to the Structure Elucidation of Large Molecules , 1994, Journal of chemical information and computer sciences.