Implementation of multiple-instance learning in drug activity prediction

BackgroundIn the context of drug discovery and development, much effort has been exerted to determine which conformers of a given molecule are responsible for the observed biological activity. In this work we aimed to predict bioactive conformers using a variant of supervised learning, named multiple-instance learning. A single molecule, treated as a bag of conformers, is biologically active if and only if at least one of its conformers, treated as an instance, is responsible for the observed bioactivity; and a molecule is inactive if none of its conformers is responsible for the observed bioactivity. The implementation requires instance-based embedding, and joint feature selection and classification. The goal of the present project is to implement multiple-instance learning in drug activity prediction, and subsequently to identify the bioactive conformers for each molecule.MethodsWe encoded the 3-dimensional structures using pharmacophore fingerprints which are binary strings, and accomplished instance-based embedding using calculated dissimilarity distances. Four dissimilarity measures were employed and their performances were compared. 1-norm SVM was used for joint feature selection and classification. The approach was applied to four data sets, and the best proposed model for each data set was determined by using the dissimilarity measure yielding the smallest number of selected features.ResultsThe predictive abilities of the proposed approach were compared with three classical predictive models without instance-based embedding. The proposed approach produced the best predictive models for one data set and second best predictive models for the rest of the data sets, based on the external validations. To validate the ability of the proposed approach to find bioactive conformers, 12 small molecules with co-crystallized structures were seeded in one data set. 10 out of 12 co-crystallized structures were indeed identified as significant conformers using the proposed approach.ConclusionsThe proposed approach was proven not to suffer from overfitting and to be highly competitive with classical predictive models, so it is very powerful for drug activity prediction. The approach was also validated as a useful method for pursuit of bioactive conformers.

[1]  Malcolm J. McGregor,et al.  Pharmacophore Fingerprinting. 1. Application to QSAR and Focused Library Design , 1999, J. Chem. Inf. Comput. Sci..

[2]  Yixin Chen,et al.  Leveraging domain information to restructure biological prediction , 2011, BMC Bioinformatics.

[3]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[4]  Jinbo Bi,et al.  Dimensionality Reduction via Sparse Support Vector Machines , 2003, J. Mach. Learn. Res..

[5]  Francesca Fanelli,et al.  Computational Modeling Approaches to StructureFunction Analysis of G Protein-Coupled Receptors , 2005 .

[6]  Gerhard Hessler,et al.  Drug Design Strategies for Targeting G‐Protein‐Coupled Receptors , 2002, Chembiochem : a European journal of chemical biology.

[7]  K. Brouwer,et al.  Pharmacokinetic and Pharmacodynamic Implications of P‐glycoprotein Modulation , 2001, Pharmacotherapy.

[8]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Francesca Fanelli,et al.  Computational modeling approaches to structure-function analysis of G protein-coupled receptors. , 2005, Chemical reviews.

[10]  Yixin Chen,et al.  MILES: Multiple-Instance Learning via Embedded Instance Selection , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[13]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[14]  Ken Mackie,et al.  Cannabinoid receptors as therapeutic targets. , 2006, Annual review of pharmacology and toxicology.

[15]  J. Mason,et al.  New 4-point pharmacophore method for molecular similarity and diversity applications: overview of the method and applications, including a novel approach to the design of combinatorial libraries containing privileged substructures. , 1999, Journal of medicinal chemistry.

[16]  Gerhard Hessler,et al.  Drug Design Strategies for Targeting G-Protein-Coupled Receptors , 2002 .

[17]  Robert J. Doerksen,et al.  Assignment of absolute configuration of sulfinyl dilactones: Optical rotations and 1H NMR experiment and DFT calculations , 2011 .

[18]  Xue Qiao,et al.  HPLC method for comparative study on tissue distribution in rat after oral administration of salvianolic acid B and phenolic acids from Salvia miltiorrhiza. , 2007, Biomedical chromatography : BMC.

[19]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[20]  Erik Evensen,et al.  A computational ensemble pharmacophore model for identifying substrates of P-glycoprotein. , 2002, Journal of medicinal chemistry.

[21]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[22]  Xiaomin Zou,et al.  Synthesis of protected aminoalkyl sulfinyl dilactones from α-amino acids , 2007 .

[23]  S. Nikas,et al.  Cannabinoid receptors as therapeutic targets. , 2006, Current pharmaceutical design.

[24]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[25]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[26]  Xuefeng Bruce Ling,et al.  Significance Analysis and Multiple Pharmacophore Models for Differentiating P-Glycoprotein Substrates , 2007, J. Chem. Inf. Model..

[27]  Xiaomin Zou,et al.  Design and synthesis of a novel class of furan-based molecules as potential 20S proteasome inhibitors. , 2007, Bioorganic & medicinal chemistry letters.

[28]  P. Beroza,et al.  A rapid computational method for lead evolution: description and application to alpha(1)-adrenergic antagonists. , 2000, Journal of medicinal chemistry.

[29]  Tomás Lozano-Pérez,et al.  Image database retrieval with multiple-instance learning techniques , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[30]  Michel Goedert,et al.  GSK3 inhibitors: development and therapeutic potential , 2004, Nature Reviews Drug Discovery.

[31]  Sheng Liu,et al.  Combined Rule Extraction and Feature Elimination in Supervised Classification , 2012, IEEE Transactions on NanoBioscience.

[32]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[33]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.