Emerging Pattern Mining To Aid Toxicological Knowledge Discovery

Knowledge-based systems for toxicity prediction are typically based on rules, known as structural alerts, that describe relationships between structural features and different toxic effects. The identification of structural features associated with toxicological activity can be a time-consuming process and often requires significant input from domain experts. Here, we describe an emerging pattern mining method for the automated identification of activating structural features in toxicity data sets that is designed to help expedite the process of alert development. We apply the contrast pattern tree mining algorithm to generate a set of emerging patterns of structural fragment descriptors. Using the emerging patterns it is possible to form hierarchical clusters of compounds that are defined by the presence of common structural features and represent distinct chemical classes. The method has been tested on a large public in vitro mutagenicity data set and a public hERG channel inhibition data set and is shown to be effective at identifying common toxic features and recognizable classes of toxicants. We also describe how knowledge developers can use emerging patterns to improve the specificity and sensitivity of an existing expert system.

[1]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[2]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[3]  C. Marchant,et al.  Computational toxicology: a tool for all industries , 2012 .

[4]  John Bradshaw,et al.  Similarity Searching Using Reduced Graphs , 2003, J. Chem. Inf. Comput. Sci..

[5]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[6]  M. Sanguinetti,et al.  A structural basis for drug-induced long QT syndrome. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Claudio Carpineto,et al.  GALOIS: An Order-Theoretic Approach to Conceptual Clustering , 1993, ICML.

[8]  E Benfenati,et al.  Automatic knowledge extraction from chemical structures: the case of mutagenicity prediction , 2013, SAR and QSAR in environmental research.

[9]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[10]  Ichigaku Takigawa,et al.  Graph mining: procedure, application to drug discovery and recent advances. , 2013, Drug discovery today.

[11]  Darren V. S. Green,et al.  The Reduced Graph Descriptor in Virtual Screening and Data-Driven Clustering of High-Throughput Screening Data , 2005, J. Chem. Inf. Model..

[12]  Jürgen Bajorath,et al.  Emerging Chemical Patterns: A New Methodology for Molecular Classification and Compound Selection. , 2007 .

[13]  Klaus-Robert Müller,et al.  Benchmark Data Set for in Silico Prediction of Ames Mutagenicity , 2009, J. Chem. Inf. Model..

[14]  Valerie J. Gillet,et al.  Automating Knowledge Discovery for Toxicity Prediction Using Jumping Emerging Pattern Mining , 2012, J. Chem. Inf. Model..

[15]  Judith C. Madden,et al.  In silico toxicology : principles and applications , 2010 .

[16]  Gilles Klopman,et al.  Effectiveness of CASE Ultra Expert System in Evaluating Adverse Effects of Drugs , 2013, Molecular informatics.

[17]  William L. Jorgensen,et al.  Journal of Chemical Information and Modeling , 2005, J. Chem. Inf. Model..

[18]  Jürgen Bajorath,et al.  Classification of Compounds with Distinct or Overlapping Multi-Target Activities and Diverse Molecular Mechanisms Using Emerging Chemical Patterns , 2013, J. Chem. Inf. Model..

[19]  John C. Dearden,et al.  In silico prediction of drug toxicity , 2003, J. Comput. Aided Mol. Des..

[20]  José Francisco Martínez Trinidad,et al.  A New Emerging Pattern Mining Algorithm and Its Application in Supervised Classification , 2010, PAKDD.

[21]  Judith C. Madden,et al.  In Silico Toxicology , 2010 .

[22]  Bruno Crémilleux,et al.  Extracting and summarizing the frequent emerging graph patterns from a dataset of graphs , 2011, Journal of Intelligent Information Systems.

[23]  Kotagiri Ramamohanarao,et al.  Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers , 2006, IEEE Transactions on Knowledge and Data Engineering.

[24]  Ronan Bureau,et al.  Introduction of Jumping Fragments in Combination with QSARs for the Assessment of Classification in Ecotoxicology , 2010, J. Chem. Inf. Model..

[25]  Ferenc Darvas,et al.  HazardExpert: An Expert System for Predicting Chemical Toxicity , 1992 .

[26]  Thomas Bäck,et al.  Substructure Mining Using Elaborate Chemical Representation , 2006, J. Chem. Inf. Model..

[27]  Christos A. Nicolaou,et al.  Analysis of Large Screening Data Sets via Adaptively Grown Phylogenetic-Like Trees , 2002, J. Chem. Inf. Comput. Sci..

[28]  Jun Chen,et al.  A structural basis for drug-induced long QT syndrome. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Romualdo Benigni,et al.  The Benigni / Bossa rulebase for mutagenicity and carcinogenicity - a module of Toxtree , 2008 .

[30]  Mohammad Afshar and Nathalie Jullian Novel Rule-Based Method for Multi-Parametric Multi-Objective Decision Support in Lead Optimization Using KEM , 2008 .

[31]  H. L. Morgan The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. , 1965 .

[32]  Alex M Aronov,et al.  Predictive in silico modeling for hERG channel blockers. , 2005, Drug discovery today.