Application of Bioactivity Profile-Based Fingerprints for Building Machine Learning Models

The volume of high throughput screening data has considerably increased since the beginning of the automated biochemical and cell-based assays era. This information-rich data source provides tremendous repurposing opportunities for data mining. It was recently shown that biochemical or cell-based assay results can be compiled into so-called high-throughput fingerprints (HTSFPs) as a new type of descriptor describing molecular bioactivity profiles which can be applied in virtual screening, iterative screening, and target deconvolution. However, so far, studies around HTSFPs and machine learning have mainly focused on predicting the outcome of molecules in single high-throughput assays, and no one has reported the modeling of compounds' biochemical assay activities toward a panel of target proteins. In this article, we aim at comparing how our in-house HTSFPs perform at this when combined with multitask deep learning versus the single task support vector machine method both in terms of hit identification and of scaffold hopping potential. Performances obtained from the two HTSFP models were reported with respect to the performances of multitask deep learning and support vector machine models built with the structural descriptors ECFP. Moreover, we investigated the effect of high throughput screening false positives and negatives on the performance of the generated models. Our results showed that the two fingerprints yielded in similar performances and diverse hits with very little overlap, thus demonstrating the orthogonality of bioactivity profile-based descriptors with structural descriptors. Therefore, modeling compound activity data using ECFPs together with HTSFPs increases the scaffold hopping potential of the predictive models.

[1]  Hugo Ceulemans,et al.  Large-scale comparison of machine learning methods for drug target prediction on ChEMBL , 2018, Chemical science.

[2]  D A Scudiero,et al.  Display and analysis of patterns of differential activity of drugs against human tumor cell lines: development of mean graph and COMPARE algorithm. , 1989, Journal of the National Cancer Institute.

[3]  Peter G. Schultz,et al.  In silico activity profiling reveals the mechanism of action of antimalarials discovered in a high-throughput screen , 2008, Proceedings of the National Academy of Sciences.

[4]  G. Maggiora,et al.  Molecular similarity in medicinal chemistry. , 2014, Journal of medicinal chemistry.

[5]  Nina Jeliazkova,et al.  AMBIT RESTful web services: an implementation of the OpenTox application programming interface , 2011, J. Cheminformatics.

[6]  Vijay S. Pande,et al.  Massively Multitask Networks for Drug Discovery , 2015, ArXiv.

[7]  Nina Jeliazkova,et al.  Ambit‐Tautomer: An Open Source Tool for Tautomer Generation , 2013, Molecular informatics.

[8]  Anne Mai Wassermann,et al.  Efficient search of chemical space: navigating from fragments to structurally diverse chemotypes. , 2013, Journal of medicinal chemistry.

[9]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[10]  Valerie J Gillet,et al.  Effect of missing data on multitask prediction methods , 2018, Journal of Cheminformatics.

[11]  Thomas Blaschke,et al.  The rise of deep learning in drug discovery. , 2018, Drug discovery today.

[12]  J N Weinstein,et al.  Neural computing in cancer drug development: predicting mechanism of action. , 1992, Science.

[13]  A. Bender,et al.  Analysis of Iterative Screening with Stepwise Compound Selection Based on Novartis In-house HTS Data. , 2016, ACS chemical biology.

[14]  David M. Rocke,et al.  Predicting ligand binding to proteins by affinity fingerprinting. , 1995, Chemistry & biology.

[15]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[16]  Brian K Shoichet,et al.  A detergent-based assay for the detection of promiscuous inhibitors , 2006, Nature Protocols.

[17]  Peter S. Kutchukian,et al.  Iterative Focused Screening with Biological Fingerprints Identifies Selective Asc-1 Inhibitors Distinct from Traditional High Throughput Screening. , 2017, ACS chemical biology.

[18]  Álvaro Cortés Cabrera,et al.  Optimal HTS Fingerprint Definitions by Using a Desirability Function and a Genetic Algorithm , 2018, J. Chem. Inf. Model..

[19]  Andreas Verras,et al.  Is Multitask Deep Learning Practical for Pharma? , 2017, J. Chem. Inf. Model..

[20]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[21]  P. Petrone,et al.  Aggregated Compound Biological Signatures Facilitate Phenotypic Drug Discovery and Target Elucidation. , 2016, ACS chemical biology.

[22]  S. Hochreiter,et al.  DeepTox: Toxicity prediction using deep learning , 2017 .

[23]  A. Fliri,et al.  Biospectra analysis: model proteome characterizations for linking molecular structure and biological response. , 2005, Journal of medicinal chemistry.

[24]  A. Fliri,et al.  Analysis of drug-induced effect patterns to link structure and side effects of medicines , 2005, Nature chemical biology.

[25]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[26]  Lars Carlsson,et al.  ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics , 2017, Journal of Cheminformatics.

[27]  G. Bemis,et al.  The properties of known drugs. 1. Molecular frameworks. , 1996, Journal of medicinal chemistry.

[28]  Andreas Bender,et al.  Data-Driven Derivation of an "Informer Compound Set" for Improved Selection of Active Compounds in High-Throughput Screening , 2016, J. Chem. Inf. Model..

[29]  Yuan Wang,et al.  Using Information from Historical High-Throughput Screens to Predict Active Compounds , 2014, J. Chem. Inf. Model..

[30]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[31]  G. S. Johnson,et al.  An Information-Intensive Approach to the Molecular Pharmacology of Cancer , 1997, Science.

[32]  Yanli Wang,et al.  Identifying Compound-Target Associations by Combining Bioactivity Profile Similarity Search and Public Databases Mining , 2011, J. Chem. Inf. Model..

[33]  Andy Liaw,et al.  Demystifying Multitask Deep Neural Networks for Quantitative Structure-Activity Relationships , 2017, J. Chem. Inf. Model..

[34]  J. Bajorath,et al.  Scaffold hopping using two-dimensional fingerprints: true potential, black magic, or a hopeless endeavor? Guidelines for virtual screening. , 2010, Journal of medicinal chemistry.

[35]  Nicole E. Bodycombe,et al.  Connecting Small Molecules with Similar Assay Performance Profiles Leads to New Biological Hypotheses , 2014, Journal of biomolecular screening.

[36]  Knut Baumann,et al.  Reliable estimation of prediction errors for QSAR models under model uncertainty using double cross-validation , 2014, Journal of Cheminformatics.

[37]  A. Fliri,et al.  Biological spectra analysis: Linking biological activity profiles to molecular structure. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Peter S. Kutchukian,et al.  Rethinking molecular similarity: comparing compounds on the basis of biological activity. , 2012, ACS chemical biology.

[39]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[40]  D. Bojanic,et al.  Impact of high-throughput screening in biomedical research , 2011, Nature Reviews Drug Discovery.

[41]  Anne Mai Wassermann,et al.  A screening pattern recognition method finds new and divergent targets for drugs and natural products. , 2014, ACS chemical biology.

[42]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[43]  Anne Mai Wassermann,et al.  Public Domain HTS Fingerprints: Design and Evaluation of Compound Bioactivity Profiles from PubChem's Bioassay Repository , 2016, J. Chem. Inf. Model..

[44]  Anne Mai Wassermann,et al.  Bioturbo Similarity Searching: Combining Chemical and Biological Similarity To Discover Structurally Diverse Bioactive Molecules , 2013, J. Chem. Inf. Model..