A Binary Ant Colony Optimization Classifier for Molecular Activities

Chemical fingerprints encode the presence or absence of molecular features and are available in many large databases. Using a variation of the Ant Colony Optimization (ACO) paradigm, we describe a binary classifier based on feature selection from fingerprints. We discuss the algorithm and possible cross-validation procedures. As a real-world example, we use our algorithm to analyze a Plasmodium falciparum inhibition assay and contrast its performance with other machine learning paradigms in use today (decision tree induction, random forests, support vector machines, artificial neural networks). Our algorithm matches established paradigms in predictive power, yet supplies the medicinal chemist and basic researcher with easily interpretable results. Furthermore, models generated with our paradigm are easy to implement and can complement virtual screenings by additionally exploiting the precalculated fingerprint information.

[1]  John M. Barnard,et al.  Clustering of chemical structures on the basis of two-dimensional similarity measures , 1992, J. Chem. Inf. Comput. Sci..

[2]  Dimitris K. Agrafiotis,et al.  A Novel Method for Building Regression Tree Models for QSAR Based on Artificial Ant Colony Systems , 2001, J. Chem. Inf. Comput. Sci..

[3]  Yanli Wang,et al.  Binary Classification of Aqueous Solubility Using Support Vector Machines with Reduction and Recombination Feature Selection , 2011, J. Chem. Inf. Model..

[4]  Darko Butina,et al.  Unsupervised Data Base Clustering Based on Daylight's Fingerprint and Tanimoto Similarity: A Fast and Automated Way To Cluster Small and Large Data Sets , 1999, J. Chem. Inf. Comput. Sci..

[5]  Richard Jensen,et al.  Feature Selection and Linear/Nonlinear Regression Methods for the Accurate Prediction of Glycogen Synthase Kinase-3β Inhibitory Activities , 2009, J. Chem. Inf. Model..

[6]  Robert C. Glen,et al.  Similarity Metrics and Descriptor Spaces – Which Combinations to Choose? , 2006 .

[7]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[8]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[9]  Malcolm J. McGregor,et al.  Clustering of Large Databases of Compounds: Using the MDL "Keys" as Structural Descriptors , 1997, J. Chem. Inf. Comput. Sci..

[10]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[11]  Jörg Huwyler,et al.  Combinatorial QSAR modeling of human intestinal absorption. , 2011, Molecular pharmaceutics.

[12]  Jian-Hui Jiang,et al.  Modified Ant Colony Optimization Algorithm for Variable Selection in QSAR Modeling: QSAR Studies of Cyclooxygenase Inhibitors , 2005, J. Chem. Inf. Model..

[13]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[14]  M. Ramjee,et al.  Substrate mapping and inhibitor profiling of falcipain-2, falcipain-3 and berghepain-2: implications for peptidase anti-malarial drug discovery. , 2006, The Biochemical journal.

[15]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[16]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[17]  G. Theraulaz,et al.  Inspiration for optimization from social insect behaviour , 2000, Nature.

[18]  Akash Khandelwal,et al.  In silico ADME modelling 2: computational models to predict human serum albumin binding affinity using ant colony systems. , 2006, Bioorganic & medicinal chemistry.

[19]  Stefano Moro,et al.  Pharmaceutical Perspectives of Nonlinear QSAR Strategies , 2010, J. Chem. Inf. Model..

[20]  Thomas Stützle,et al.  Empirical Scoring Functions for Advanced Protein-Ligand Docking with PLANTS , 2009, J. Chem. Inf. Model..

[21]  Peter G. Schultz,et al.  In silico activity profiling reveals the mechanism of action of antimalarials discovered in a high-throughput screen , 2008, Proceedings of the National Academy of Sciences.

[22]  Peter Willett,et al.  Similarity-based virtual screening using 2D fingerprints. , 2006, Drug discovery today.

[23]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.