Bioalerts: a python library for the derivation of structural alerts from bioactivity and toxicity data sets

BackgroundAssessing compound toxicity at early stages of the drug discovery process is a crucial task to dismiss drug candidates likely to fail in clinical trials. Screening drug candidates against structural alerts, i.e. chemical fragments associated to a toxicological response prior or after being metabolized (bioactivation), has proved a valuable approach for this task. During the last decades, diverse algorithms have been proposed for the automatic derivation of structural alerts from categorical toxicity data sets.Results and conclusionsHere, the python library bioalerts is presented, which comprises functionalities for the automatic derivation of structural alerts from categorical (dichotomous), e.g. toxic/non-toxic, and continuous bioactivity data sets, e.g. $$K_{i}$$Ki or $$\hbox {pIC}_{50}$$pIC50 values. The library bioalerts relies on the RDKit implementation of the circular Morgan fingerprint algorithm to compute chemical substructures, which are derived by considering radial atom neighbourhoods of increasing bond radius. In addition to the derivation of structural alerts, bioalerts provides functionalities for the calculation of unhashed (keyed) Morgan fingerprints, which can be used in predictive bioactivity modelling with the advantage of allowing for a chemically meaningful deconvolution of the chemical space. Finally, bioalerts provides functionalities for the easy visualization of the derived structural alerts.

[1]  Paolo Tosco,et al.  Bringing the MMFF force field to the RDKit: implementation and validation , 2014, Journal of Cheminformatics.

[2]  Isidro Cortes-Ciriano,et al.  Polypharmacology modelling using proteochemometrics (PCM): recent methodological developments, applications to target families, and future prospects , 2015 .

[3]  M T D Cronin,et al.  A review of the electrophilic reaction chemistry involved in covalent protein binding relevant to toxicity , 2011, Critical reviews in toxicology.

[4]  Scott Boyer,et al.  Computational Derivation of Structural Alerts from Large Toxicology Data Sets , 2014, J. Chem. Inf. Model..

[5]  Christophe G. Lambert,et al.  Mixture deconvolution and analysis of Ames mutagenicity data , 2002 .

[6]  Julian E. Fuchs,et al.  Matched molecular pair analysis: significance and the impact of experimental uncertainty. , 2014, Journal of medicinal chemistry.

[7]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[8]  S. Enoch,et al.  Identification of mechanisms of toxic action for skin sensitisation using a SMARTS pattern based approach , 2008, SAR and QSAR in environmental research.

[9]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[10]  Isidro Cortes-Ciriano,et al.  Prediction of the potency of mammalian cyclooxygenase inhibitors with ensemble proteochemometric modeling , 2015, Journal of Cheminformatics.

[11]  Jean-Loup Faulon,et al.  The signature molecular descriptor. 3. Inverse-quantitative structure-activity relationship of ICAM-1 inhibitory peptides. , 2003, Journal of molecular graphics & modelling.

[12]  Andreas Bender,et al.  How Diverse Are Diversity Assessment Methods? A Comparative Analysis and Benchmarking of Molecular Descriptor Space , 2014, J. Chem. Inf. Model..

[13]  R. Tennant,et al.  Chemical structure, Salmonella mutagenicity and extent of carcinogenicity as indicators of genotoxic carcinogenesis among 222 chemicals tested in rodents by the U.S. NCI/NTP. , 1988, Mutation research.

[14]  Frederick P. Roth,et al.  Chemical substructures that enrich for biological activity , 2008, Bioinform..

[15]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[16]  Romualdo Benigni,et al.  Mechanisms of chemical carcinogenicity and mutagenicity: a review with implications for predictive toxicology. , 2011, Chemical reviews.

[17]  Aleksey Buzmakov,et al.  Discovering Structural Alerts for Mutagenicity Using Stable Emerging Molecular Patterns , 2015, J. Chem. Inf. Model..

[18]  Andreas Bender,et al.  Modelling ligand selectivity of serine proteases using integrative proteochemometric approaches improves model performance and allows the multi-target dependent interpretation of features. , 2014, Integrative biology : quantitative biosciences from nano to macro.

[19]  J E Ridings,et al.  Computer prediction of possible toxic action from chemical structure: an update on the DEREK system. , 1996, Toxicology.

[20]  Christos A. Nicolaou,et al.  Analysis of Large Screening Data Sets via Adaptively Grown Phylogenetic-Like Trees , 2002, J. Chem. Inf. Comput. Sci..

[21]  Romualdo Benigni,et al.  Structure alerts for carcinogenicity, and the Salmonella assay system: a novel insight through the chemical relational databases technology. , 2008, Mutation research.

[22]  A. Bailey,et al.  The use of structure-activity relationship analysis in the food contact notification program. , 2005, Regulatory toxicology and pharmacology : RTP.

[23]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[24]  Andreas Bender,et al.  How Similar Are Similarity Searching Methods? A Principal Component Analysis of Molecular Descriptor Space , 2009, J. Chem. Inf. Model..

[25]  George Karypis,et al.  Frequent substructure-based approaches for classifying chemical compounds , 2003, IEEE Transactions on Knowledge and Data Engineering.

[26]  Jürgen Bajorath,et al.  Emerging Chemical Patterns: A New Methodology for Molecular Classification and Compound Selection. , 2007 .

[27]  Alexander Golbraikh,et al.  QSAR Modeling of the Blood–Brain Barrier Permeability for Diverse Organic Compounds , 2008, Pharmaceutical Research.

[28]  Jean-Loup Faulon,et al.  The Signature Molecular Descriptor. 1. Using Extended Valence Sequences in QSAR and QSPR Studies , 2003, J. Chem. Inf. Comput. Sci..

[29]  H. Rosenkranz,et al.  The structural basis of the mutagenicity of chemicals in Salmonella typhimurium: the Gene-Tox data base. , 1990, Mutation research.

[30]  Jean-Loup Faulon,et al.  The Signature Molecular Descriptor. 2. Enumerating Molecules from Their Extended Valence Sequences , 2003, J. Chem. Inf. Comput. Sci..

[31]  George Karypis,et al.  An efficient algorithm for discovering frequent subgraphs , 2004, IEEE Transactions on Knowledge and Data Engineering.