Classification of a large anticancer data set by Adaptive Fuzzy Partition

AbstractAn Adaptive Fuzzy Partition (AFP) algorithm, derived from Fuzzy Logic concepts, was used to classify an anticancer data set, including about 1300 compounds subdivided into eight mechanisms of action. AFP classification builds relationships between molecular descriptors and bio-activities by dynamically dividing the descriptor hyperspace into a set of fuzzy subspaces. These subspaces are described by simple linguistic rules, from which scores ranging between 0 and 1 can be derived. The latter values define, for each compound, the degrees of membership of the different mechanisms analyzed. A particular attention was devoted to develop structure–activity relations that have a real utility. Then, well-defined and widely accepted protocols were used to validate the models by defining their robustness and prediction ability. More particularly, after selecting the most relevant descriptors with help of a genetic algorithm, a training set of 640 compounds was isolated by a rational procedure based on Self-Organizing Maps. The related AFP model was then validated with help of a validation set and, above all, of cross-validation and Y-randomization procedures. Good validation scores of about 80% were obtained, underlining the robustness of the model. Moreover, the prediction ability was evaluated with 374 test compounds that had not been used to establish the model and 77% of them were predicted correctly.

[1]  D Selwood Anticancer Drug Development: Preclinical Screening, Clinical Trials and Approval , 2004, British Journal of Cancer.

[2]  J N Weinstein,et al.  Quantitative structure-antitumor activity relationships of camptothecin analogues: cluster analysis and genetic algorithm-based studies. , 2001, Journal of medicinal chemistry.

[3]  Marco Pintore,et al.  Prediction of oral bioavailability by adaptive fuzzy partitioning. , 2003, European journal of medicinal chemistry.

[4]  W. Foye Cancer chemotherapeutic agents , 1995 .

[5]  Jae K. Lee,et al.  Mining and Visualizing Large Anticancer Drug Discovery Databases , 2000, J. Chem. Inf. Comput. Sci..

[6]  H. Kubinyi,et al.  3D QSAR in drug design. , 2002 .

[7]  Marco Pintore,et al.  Predicting Toxicity against the fathead Minnow by Adaptive Fuzzy Partition , 2003 .

[8]  J N Weinstein,et al.  Use of the Kohonen self-organizing map to study the mechanisms of action of chemotherapeutic agents. , 1994, Journal of the National Cancer Institute.

[9]  Han van de Waterbeemd,et al.  Chemometric methods in molecular design , 1995 .

[10]  M. Gupta,et al.  Theory of T -norms and fuzzy inference methods , 1991 .

[11]  J N Weinstein,et al.  Mining the National Cancer Institute Anticancer Drug Discovery Database: cluster analysis of ellipticine analogs with p53-inverse and central nervous system-selective patterns of activity. , 1998, Molecular pharmacology.

[12]  Randy L. Haupt,et al.  Practical Genetic Algorithms , 1998 .

[13]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[14]  Wolf-Dietrich Ihlenfeldt,et al.  PASS Biological Activity Spectrum Predictions in the Enhanced Open NCI Database Browser , 2003, J. Chem. Inf. Comput. Sci..

[15]  Han van de Waterbeemd,et al.  Chemometric Methods in Molecular Design: van de Waterbeemd/Chemometric , 1995 .

[16]  J. Folkman Angiogenesis in cancer, vascular, rheumatoid and other disease , 1995, Nature Medicine.

[17]  D Faraggi,et al.  Discrimination techniques applied to the NCI in vitro anti-tumour drug screen: predicting biochemical mechanism of action. , 1994, Statistics in medicine.

[18]  Teuvo Kohonen,et al.  Self-Organizing Maps, Third Edition , 2001, Springer Series in Information Sciences.

[19]  Denis M. Bayada,et al.  Molecular Diversity and Representativity in Chemical Databases , 1999, J. Chem. Inf. Comput. Sci..

[20]  R. Leardi,et al.  Genetic algorithms applied to feature selection in PLS regression: how and when to use them , 1998 .

[21]  O. Taboureau,et al.  Development of predictive models by adaptive fuzzy partitioning. Application to compounds active on the central nervous system , 2003 .

[22]  Marco Pintore,et al.  Database mining with adaptive fuzzy partition: Application to the prediction of pesticide toxicity on rats , 2003, Environmental toxicology and chemistry.

[23]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[24]  Michael R. Boyd,et al.  The NCI In Vitro Anticancer Drug Discovery Screen , 1997 .

[25]  J N Weinstein,et al.  Neural computing in cancer drug development: predicting mechanism of action. , 1992, Science.

[26]  D A Scudiero,et al.  Display and analysis of patterns of differential activity of drugs against human tumor cell lines: development of mean graph and COMPARE algorithm. , 1989, Journal of the National Cancer Institute.

[27]  Marco Pintore,et al.  Adaptive fuzzy partition in database mining: application to olfaction , 2002, Data Sci. J..

[28]  G. S. Johnson,et al.  An Information-Intensive Approach to the Molecular Pharmacology of Cancer , 1997, Science.

[29]  E. Gordon,et al.  Combinatorial chemistry and molecular diversity in drug discovery , 1998 .

[30]  M. Pintore,et al.  Molecular descriptor selection combining genetic algorithms and fuzzy logic: application to database mining procedures , 2002 .

[31]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[32]  Michio Sugeno,et al.  A fuzzy-logic-based approach to qualitative modeling , 1993, IEEE Trans. Fuzzy Syst..

[33]  L. Zadeh Fuzzy sets and their application to pattern classification and clustering analysis , 1996 .

[34]  F Ros,et al.  Database mining applied to central nervous system (CNS) activity. , 2001, European journal of medicinal chemistry.

[35]  Yinghua Lin,et al.  Building a Fuzzy System from Input-Output Data , 1994, J. Intell. Fuzzy Syst..

[36]  Joseph S. Verducci,et al.  On Combining Recursive Partitioning and Simulated Annealing To Detect Groups of Biologically Active Compounds , 2002, J. Chem. Inf. Comput. Sci..

[37]  Robert F. Ling,et al.  Classification and Clustering. , 1979 .