On Combining Recursive Partitioning and Simulated Annealing To Detect Groups of Biologically Active Compounds

Statistical data mining methods have proven to be powerful tools for investigating correlations between molecular structure and biological activity. Recursive partitioning (RP), in particular, offers several advantages in mining large, diverse data sets resulting from high throughput screening. When used with binary molecular descriptors, the standard implementation of RP splits on single descriptors. We use simulated annealing (SA) to find combinations of molecular descriptors whose simultaneous presence best separates off the most active, chemically similar group of compounds. The search is incorporated into a recursive partitioning design to produce a regression tree for biological activity on the space of structural fingerprints. Each node is characterized by a specific combination of structural features, and the terminal nodes with high average activities correspond, roughly, to different classes of compounds. Using LeadScope structural features as descriptors to mine a database from the National Cancer Institute, the merging of RP and SA consistently identifies structurally homogeneous classes of highly potent anticancer agents.

[1]  Xin Chen,et al.  Recursive Partitioning Analysis of a Large Structure-Activity Data Set Using Three-Dimensional Descriptors1 , 1998, J. Chem. Inf. Comput. Sci..

[2]  D. Hawkins,et al.  Analysis of a Large Structure‐Activity Data Set Using Recursive Partitioning , 1997 .

[3]  Sung Jin Cho,et al.  Binary Formal Inference-Based Recursive Modeling Using Multiple Atom and Physicochemical Property Class Pair and Torsion Descriptors as Decision Criteria , 2000, J. Chem. Inf. Comput. Sci..

[4]  David W. Miller,et al.  Results of a New Classification Algorithm Combining K Nearest Neighbors and Recursive Partitioning , 2001, J. Chem. Inf. Comput. Sci..

[5]  Chris L. Waller,et al.  Rational Combinatorial Library Design. 3. Simulated Annealing Guided Evaluation (SAGE) of Molecular Diversity: A Novel Computational Tool for Universal Library Design and Database Mining , 1999, J. Chem. Inf. Comput. Sci..

[6]  Robert P. Sheridan,et al.  Chemical Similarity Using Physiochemical Property Descriptors , 1996, J. Chem. Inf. Comput. Sci..

[7]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[8]  D K Jones-Hertzog,et al.  Use of recursive partitioning in the sequential screening of G-protein-coupled receptors. , 1999, Journal of pharmacological and toxicological methods.

[9]  Luca Maria Gambardella,et al.  Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[10]  Dimitris K. Agrafiotis,et al.  A Novel Method for Building Regression Tree Models for QSAR Based on Artificial Ant Colony Systems , 2001, J. Chem. Inf. Comput. Sci..

[11]  M. Boyd,et al.  Some practical considerations and applications of the national cancer institute in vitro anticancer drug discovery screen , 1995 .

[12]  Christophe G. Lambert,et al.  Analysis of a Large Structure/Biological Activity Data Set Using Recursive Partitioning , 1999, J. Chem. Inf. Comput. Sci..

[13]  Xin Chen,et al.  Automated Pharmacophore Identification for Large Chemical Data Sets1 , 1999, J. Chem. Inf. Comput. Sci..

[14]  P. Rabinowitz,et al.  The Separation of Madagascar and Africa , 1983, Science.

[15]  Glenn J. Myatt,et al.  LeadScope: Software for Exploring Large Sets of Screening Data , 2000, J. Chem. Inf. Comput. Sci..