TF3P: A New Three-dimensional Force Fields Fingerprint Learned by Deep Capsular Network.

Molecular fingerprints are the workhorse in ligand-based drug discovery. In recent years, an increasing number of research papers reported fascinating results on using deep neural networks to learn 2D molecular representations as fingerprints. It is anticipated that the integration of deep learning would also contribute to the prosperity of 3D fingerprints. Here, we unprecedentedly introduce deep learning into 3D small molecule fingerprints, presenting a new one we termed as the three-dimensional force fields fingerprint (TF3P). TF3P is learned by a deep capsular network whose training is in no need of labeled datasets for specific predictive tasks. TF3P can encode the 3D force fields information of molecules and demonstrates the stronger ability to capture 3D structural changes, to recognize molecules alike in 3D but not in 2D, and to identify similar targets inaccessible by other 2D or 3D fingerprints based on only ligands similarity. Furthermore, TF3P is compatible with both statistical models (e.g. similarity ensemble approach) and machine learning models. Altogether, we report TF3P as a new 3D small molecule fingerprint with a promising future in ligand-based drug discovery. All codes are written in Python and available at https://github.com/canisw/tf3p.

[1]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[2]  Alexandre Tkatchenko,et al.  Quantum-chemical insights from deep tensor neural networks , 2016, Nature Communications.

[3]  Xiaomin Luo,et al.  Pushing the boundaries of molecular representation for drug discovery with graph attention mechanism. , 2020, Journal of medicinal chemistry.

[4]  Peng Jiang,et al.  Deep Molecular Representation in Cheminformatics , 2019, Handbook of Deep Learning Applications.

[5]  Evan Bolton,et al.  Fast 3D shape screening of large chemical databases through alignment-recycling , 2007, Chemistry Central journal.

[6]  K-R Müller,et al.  SchNet - A deep learning architecture for molecules and materials. , 2017, The Journal of chemical physics.

[7]  Jean-Louis Reymond,et al.  Atom Pair 2D-Fingerprints Perceive 3D-Molecular Shape and Pharmacophores for Very Fast Virtual Screening of ZINC and GDB-17 , 2014, J. Chem. Inf. Model..

[8]  George Papadatos,et al.  The ChEMBL database in 2017 , 2016, Nucleic Acids Res..

[9]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[10]  P. Hawkins,et al.  Comparison of shape-matching and docking as virtual screening tools. , 2007, Journal of medicinal chemistry.

[11]  Stefan Schmitt,et al.  Do structurally similar ligands bind in a similar fashion? , 2006, Journal of medicinal chemistry.

[12]  David Ryan Koes,et al.  Protein-Ligand Scoring with Convolutional Neural Networks , 2016, Journal of chemical information and modeling.

[13]  Thomas A. Halgren,et al.  Merck molecular force field. III. Molecular geometries and vibrational frequencies for MMFF94 , 1996, J. Comput. Chem..

[14]  Ramaswamy Nilakantan,et al.  Topological torsion: a new molecular descriptor for SAR applications. Comparison with other descriptors , 1987, J. Chem. Inf. Comput. Sci..

[15]  R. Cramer,et al.  Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. , 1988, Journal of the American Chemical Society.

[16]  Geoffrey E. Hinton,et al.  Matrix capsules with EM routing , 2018, ICLR.

[17]  David Ryan Koes,et al.  Shape‐based virtual screening with volumetric aligned molecular shapes , 2014, J. Comput. Chem..

[18]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[19]  Zhihai Liu,et al.  Forging the Basis for Developing Protein-Ligand Interaction Scoring Functions. , 2017, Accounts of chemical research.

[20]  Gianni De Fabritiis,et al.  KDEEP: Protein-Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Networks , 2018, J. Chem. Inf. Model..

[21]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[22]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[23]  T. Halgren Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94 , 1996, J. Comput. Chem..

[24]  Kaixian Chen,et al.  Deep Learning Enhancing Kinome-Wide Polypharmacology Profiling: Model Construction and Experiment Validation. , 2019, Journal of medicinal chemistry.

[25]  Gianni De Fabritiis,et al.  Shape-Based Generative Modeling for de Novo Drug Design , 2019, J. Chem. Inf. Model..

[26]  Vijay S. Pande,et al.  Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity , 2017, ArXiv.

[27]  Izhar Wallach,et al.  AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery , 2015, ArXiv.

[28]  Tom L. Blundell,et al.  USRCAT: real-time ultrafast shape recognition with pharmacophoric constraints , 2012, Journal of Cheminformatics.

[29]  Nathanael Weill,et al.  Alignment-Free Ultra-High-Throughput Comparison of Druggable Protein-Ligand Binding Sites , 2010, J. Chem. Inf. Model..

[30]  M. Karelson,et al.  Quantum-Chemical Descriptors in QSAR/QSPR Studies. , 1996, Chemical reviews.

[31]  Paolo Tosco,et al.  Bringing the MMFF force field to the RDKit: implementation and validation , 2014, Journal of Cheminformatics.

[32]  Daisuke Kihara,et al.  Three-Dimensional Compound Comparison Methods and Their Application in Drug Discovery , 2015, Molecules.

[33]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[34]  Michael J. Keiser,et al.  A simple representation of three-dimensional molecular structure , 2017, bioRxiv.

[35]  Roman Kern,et al.  PySpark and RDKit: Moving towards Big Data in Cheminformatics , 2019, Molecular informatics.

[36]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[37]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[38]  Marcin J. Skwark,et al.  3D Deep Learning for Biological Function Prediction from Physical Fields , 2017, 2020 International Conference on 3D Vision (3DV).

[39]  W. Graham Richards,et al.  Ultrafast shape recognition to search compound databases for similar molecular shapes , 2007, J. Comput. Chem..

[40]  Peter Gedeck,et al.  QSAR - How Good Is It in Practice? Comparison of Descriptor Sets on an Unbiased Cross Section of Corporate Data Sets , 2006, J. Chem. Inf. Model..

[41]  T. Halgren Merck molecular force field. II. MMFF94 van der Waals and electrostatic parameters for intermolecular interactions , 1996 .

[42]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[43]  Justin S. Smith,et al.  Hierarchical modeling of molecular energies using a deep neural network. , 2017, The Journal of chemical physics.

[44]  R. Raghunatha Sarma,et al.  Building Deep, Equivariant Capsule Networks , 2019, ICLR.

[45]  Xiaofeng Liu,et al.  SimG: An Alignment Based Method for Evaluating the Similarity of Small Molecules and Binding Sites , 2013, J. Chem. Inf. Model..

[46]  Benjamin A. Ellingson,et al.  Conformer Generation with OMEGA: Algorithm and Validation Using High Quality Structures from the Protein Databank and Cambridge Structural Database , 2010, J. Chem. Inf. Model..

[47]  Yong Jiang,et al.  Anti-inflammatory quinoline alkaloids from the root bark of Dictamnus dasycarpus. , 2020, Phytochemistry.

[48]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[49]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[50]  G. Klebe,et al.  Molecular similarity indices in a comparative analysis (CoMSIA) of drug molecules to correlate and predict their biological activity. , 1994, Journal of medicinal chemistry.

[51]  Michael J. Keiser,et al.  Relating protein pharmacology by ligand chemistry , 2007, Nature Biotechnology.

[52]  George Papadatos,et al.  ChEMBL web services: streamlining access to drug discovery data and utilities , 2015, Nucleic Acids Res..

[53]  Guixia Liu,et al.  Performance Evaluation of 2D Fingerprint and 3D Shape Similarity Methods in Virtual Screening , 2012, J. Chem. Inf. Model..

[54]  Matthias Rupp,et al.  Unified representation of molecules and crystals for machine learning , 2017, Mach. Learn. Sci. Technol..

[55]  T. Halgren,et al.  Merck molecular force field. V. Extension of MMFF94 using experimental data, additional computational data, and empirical rules , 1996 .