A Relational Learning Approach to Structure-Activity Relationships in Drug Design Toxicity Studies

It has been recognized that the development of new therapeutic drugs is a complex and expensive process. A large number of factors affect the activity in vivo of putative candidate molecules and the propensity for causing adverse and toxic effects is recognized as one of the major hurdles behind the current "target-rich, lead-poor" scenario. Structure-Activity Relationship (SAR) studies, using relational Machine Learning (ML) algorithms, have already been shown to be very useful in the complex process of rational drug design. Despite the ML successes, human expertise is still of the utmost importance in the drug development process. An iterative process and tight integration between the models developed by ML algorithms and the know-how of medicinal chemistry experts would be a very useful symbiotic approach. In this paper we describe a software tool that achieves that goal--iLogCHEM. The tool allows the use of Relational Learners in the task of identifying molecules or molecular fragments with potential to produce toxic effects, and thus help in stream-lining drug design in silico. It also allows the expert to guide the search for useful molecules without the need to know the details of the algorithms used. The models produced by the algorithms may be visualized using a graphical interface, that is of common use amongst researchers in structural biology and medicinal chemistry. The graphical interface enables the expert to provide feedback to the learning system. The developed tool has also facilities to handle the similarity bias typical of large chemical databases. For that purpose the user can filter out similar compounds when assembling a data set. Additionally, we propose ways of providing background knowledge for Relational Learners using the results of Graph Mining algorithms.

[1]  Sean Ekins Computational toxicology : risk assessment for pharmaceutical and environmental chemicals , 2007 .

[2]  Luc De Raedt,et al.  Molecular feature mining in HIV data , 2001, KDD '01.

[3]  David S. Wishart,et al.  DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs , 2010, Nucleic Acids Res..

[4]  Stephen Muggleton,et al.  Inverse entailment and progol , 1995, New Generation Computing.

[5]  Egon L. Willighagen,et al.  The Blue Obelisk—Interoperability in Chemical Informatics , 2006, J. Chem. Inf. Model..

[6]  Antti Poso,et al.  Predictive three-dimensional quantitative structure-activity relationship of cytochrome P450 1A2 inhibitors. , 2005, Journal of medicinal chemistry.

[7]  I. Pogribny,et al.  Exon-specific DNA hypomethylation of the p53 gene of rat colon induced by dimethylhydrazine. Modulation by dietary folate. , 1996, The American journal of pathology.

[8]  H. van de Waterbeemd,et al.  ADMET in silico modelling: towards prediction paradise? , 2003, Nature reviews. Drug discovery.

[9]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[10]  Nancy Argüelles,et al.  Author ' s , 2008 .

[11]  Nuno A. Fonseca,et al.  LogCHEM: Interactive Discriminative Mining of Chemical Structure , 2008, 2008 IEEE International Conference on Bioinformatics and Biomedicine.

[12]  David S. Wishart,et al.  DrugBank: a knowledgebase for drugs, drug actions and drug targets , 2007, Nucleic Acids Res..

[13]  Luc De Raedt,et al.  Predictive Graph Mining , 2004, Discovery Science.

[14]  James H. Graham,et al.  Accelerating the drug design process through parallel inductive logic programming data mining , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[15]  Safe,et al.  Toxic equivalency factors (TEFs) for PCBs, PCDDs, PCDFs for humans and wildlife. , 1998, Environmental health perspectives.

[16]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[17]  A. L. Spinardi-Barbisan,et al.  Chemically induced immunotoxicity in a medium-term multiorgan bioassay for carcinogenesis with Wistar rats. , 2004, Toxicology and applied pharmacology.

[18]  R G Ulrich,et al.  Microarray analysis of hepatotoxins in vitro reveals a correlation between gene expression profiles and mechanisms of toxicity. , 2001, Toxicology letters.

[19]  John C. Dearden,et al.  In silico prediction of drug toxicity , 2003, J. Comput. Aided Mol. Des..

[20]  Ann M Richard,et al.  Distributed structure-searchable toxicity (DSSTox) public database network: a proposal. , 2002, Mutation research.

[21]  P. Mills,et al.  Prostate Cancer Risk in California Farm Workers , 2003, Journal of occupational and environmental medicine.

[22]  R. Grobholz,et al.  Reduction in the expression of glucose transporter protein GLUT 2 in preneoplastic and neoplastic hepatic lesions and reexpression of GLUT 1 in late stages of hepatocarcinogenesis. , 1993, Cancer research.

[23]  P. Hollenberg,et al.  Identification of the cytochrome P450 isozymes involved in the metabolism of N-nitrosodipropyl-,N-nitrosodibutyl- and N-nitroso-n-butyl-n-propylamine. , 1996, Carcinogenesis.

[24]  Nuno A. Fonseca,et al.  April - An Inductive Logic Programming System , 2006, JELIA.

[25]  J. Kazius,et al.  Derivation and validation of toxicophores for mutagenicity prediction. , 2005, Journal of medicinal chemistry.

[26]  Ted Simon,et al.  Development of a neurotoxic equivalence scheme of relative potency for assessing the risk of PCB mixtures. , 2007, Regulatory toxicology and pharmacology : RTP.

[27]  W. MacNaughton,et al.  Nitric oxide inhibits cAMP-dependent CFTR trafficking in intestinal epithelial cells. , 2005, American journal of physiology. Gastrointestinal and liver physiology.

[28]  Nuno A. Fonseca,et al.  Improving the efficiency of inductive logic programming systems , 2009, Softw. Pract. Exp..

[29]  Daniel Neagu,et al.  Hybrid intelligent systems for predictive toxicology - a distributed approach , 2005, 5th International Conference on Intelligent Systems Design and Applications (ISDA'05).

[30]  Nuno A. Fonseca,et al.  Partitional Clustering of Protein Sequences - An Inductive Logic Programming Approach , 2009, IWANN.

[31]  Joel Dudley,et al.  MEGA: A biologist-centric software for evolutionary analysis of DNA and protein sequences , 2008, Briefings Bioinform..

[32]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[33]  Vijay K. Gombar,et al.  A QSAR Model of Teratogenesis , 1991 .

[34]  Nuno A. Fonseca,et al.  Comparative Study of Classification Algorithms Using Molecular Descriptors in Toxicological DataBases , 2009, BSB.

[35]  B. Heinzow,et al.  Developmental neurotoxicity of polychlorinated biphenyls (PCBS): cognitive and psychomotor functions in 7-month old children. , 1998, Toxicology letters.

[36]  J. McKinney,et al.  Assessing the role of ortho-substitution on polychlorinated biphenyl binding to transthyretin, a thyroxine transport protein. , 2000, Toxicology and applied pharmacology.

[37]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[38]  Stephen Muggleton,et al.  A Novel Logic-Based Approach for Quantitative Toxicology Prediction , 2007, J. Chem. Inf. Model..

[39]  Jean-Louis Reymond,et al.  Virtual exploration of the small-molecule chemical universe below 160 Daltons. , 2005, Angewandte Chemie.

[40]  Dariusz Plewczynski,et al.  TVscreen: Trend Vector Virtual SCREENing of Large Commercial Compounds Collections , 2008, 2008 International Conference on Biocomputation, Bioinformatics, and Biomedical Technologies.

[41]  Alan G. E. Wilson,et al.  A multiple in silico program approach for the prediction of mutagenicity from chemical structure. , 2003, Mutation research.

[42]  Francesca A. Lisi,et al.  Object Identity as Search Bias for Pattern Spaces , 2002, ECAI.

[43]  William B. Langdon,et al.  Advances in the Application of Machine Learning Techniques in Drug Discovery, Design and Development , 2006 .

[44]  G. Athithan,et al.  A comparative survey of algorithms for frequent subgraph discovery , 2011 .

[45]  V. Kickhoefer,et al.  Increased susceptibility of vault poly(ADP-ribose) polymerase-deficient mice to carcinogen-induced tumorigenesis. , 2005, Cancer research.

[46]  H. Ericson,et al.  Biochemical pharmacology of the atypical neuroleptic remoxipride , 1991, Schizophrenia Research.

[47]  Robert Stroud Computational and Structural Approaches to Drug Discovery , 2007 .

[48]  D. Sanderson,et al.  Computer Prediction of Possible Toxic Action from Chemical Structure; The DEREK System , 1991, Human & experimental toxicology.