Computing similarity between structural environments of mutagenicity alerts

This article describes a method to generate molecular fingerprints from structural environments of mutagenicity alerts and calculate similarity between them. This approach was used to improve classification accuracy of alerts and for searching structurally similar analogues of an alerting chemical. It builds fingerprints using molecular fragments from the vicinity of the alerts and automatically accounts for the activating and deactivating/mitigating features of alerts needed for accurate predictions. This study also demonstrates the usefulness of transfer learning in which a distributed representation of chemical fragments was first trained on millions of unlabelled chemicals and then used for generating fingerprints and similarity search on smaller data sets labelled with Ames test outcomes. The distributed fingerprints gave better prediction performance and increased coverage compared to traditional binary fingerprints. The methodology was applied to four common mutagenic functionalities-primary aromatic amine, aromatic nitro, epoxide and alkyl chloride. Effects of various hyperparameters on prediction accuracy and test coverage for the k-nearest neighbours prediction method are also described, e.g. similarity thresholds, number of neighbours and size of the alert environment.

[1]  G. Maggiora,et al.  Molecular similarity in medicinal chemistry. , 2014, Journal of medicinal chemistry.

[2]  R. Tennant,et al.  The genetic toxicity database of the National Toxicology Program: evaluation of the relationships between genetic toxicity and carcinogenicity. , 1991, Environmental health perspectives.

[3]  Romualdo Benigni,et al.  The Benigni / Bossa rulebase for mutagenicity and carcinogenicity - a module of Toxtree , 2008 .

[4]  Doris V. Sweet,et al.  An overview of the Registry of Toxic Effects of Chemical Substances (RTECS): Critical information on chemical hazards , 1999 .

[5]  Gilles Klopman,et al.  Optimizing Predictive Performance of CASE Ultra Expert System Models Using the Applicability Domains of Individual Toxicity Alerts , 2012, J. Chem. Inf. Model..

[6]  Lidiya Stavitskaya,et al.  Principles and procedures for implementation of ICH M7 recommended (Q)SAR analyses. , 2016, Regulatory toxicology and pharmacology : RTP.

[7]  OECD GUIDELINE FOR TESTING OF CHEMICALS Bacterial Reverse Mutation Test , 1999 .

[8]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[9]  Mark W Powley,et al.  (Q)SAR assessments of potentially mutagenic impurities: a regulatory perspective on the utility of expert knowledge and data submission. , 2015, Regulatory toxicology and pharmacology : RTP.

[10]  Nigel Greene,et al.  In silico methods combined with expert knowledge rule out mutagenic potential of pharmaceutical impurities: an industry survey. , 2012, Regulatory toxicology and pharmacology : RTP.

[11]  M T D Cronin,et al.  A review of the electrophilic reaction chemistry involved in covalent DNA binding , 2010, Critical reviews in toxicology.

[12]  Vinicius M. Alves,et al.  Alarms about structural alerts. , 2016, Green chemistry : an international journal and green chemistry resource : GC.

[13]  Suman K. Chakravarti,et al.  Distributed Representation of Chemical Fragments , 2018, ACS omega.

[14]  Alexander Tropsha,et al.  Novel Variable Selection Quantitative Structure-Property Relationship Approach Based on the k-Nearest-Neighbor Principle , 2000, J. Chem. Inf. Comput. Sci..