Efficient lipophilicity prediction of molecules employing deep-learning models

Abstract Lipophilicity, expressed as logP, is a significant physiochemical property and is an indicator of absorption, distribution, metabolism and elimination characteristics of drugs used in medication. It is one of the major deciding factors of the fate of a molecule to be a successful drug. Mol2vec is a convenient and unsupervised machine learning technique which produces high-dimensional vector representations of molecules and its molecular substructures. The work described here aims to simplify prediction of logP values with high-degree of accuracy by using Deep Learning (DL) models paired with Mol2vec. The work described in this paper empirically demonstrates that by using the described DL models paired with Mol2vec, one can achieve results which are much better than the conventional ML techniques as well as more complex and recent algorithms like Message-passing Neural Networks (MPNN), Graph Convolution (GC) and Spatial Graph embedding (C-SGEN). Our RMSE (Root Mean Square Error) scores from the ensemble model is one of the best reported so far in literature. The methods elaborated in this paper are simple, yet effective in predicting logP values to a great degree of accuracy due to the use of Mol2vec and standard TensorFlow operators. The models employed here can be coded and maintained with much more ease compared to the techniques of MPNN, C-SGEN or GC.

[1]  B. Merget,et al.  Profiling Prediction of Kinase Inhibitors: Toward the Virtual Assay. , 2017, Journal of medicinal chemistry.

[2]  Dariusz Plewczynski,et al.  Virtual high throughput screening using combined random forest and flexible docking. , 2009, Combinatorial chemistry & high throughput screening.

[3]  Gokmen Zararsiz,et al.  MLViS: A Web Tool for Machine Learning-Based Virtual Screening in Early-Phase of Drug Discovery and Development , 2015, PloS one.

[4]  Simone Fulle,et al.  Kinome‐Wide Profiling Prediction of Small Molecules , 2018, ChemMedChem.

[5]  E. Lionta,et al.  Structure-Based Virtual Screening for Drug Discovery: Principles, Applications and Recent Advances , 2014, Current topics in medicinal chemistry.

[6]  Roger A. Sayle,et al.  Comparing structural fingerprints using a literature-based similarity benchmark , 2016, Journal of Cheminformatics.

[7]  YounJoon Jung,et al.  Delfos: deep learning model for prediction of solvation free energies in generic organic solvents , 2019, Chemical science.

[8]  Jiansong Fang,et al.  Predictions of BuChE Inhibitors Using Support Vector Machine and Naive Bayesian Classification Techniques in Drug Discovery , 2013, J. Chem. Inf. Model..

[9]  Brian K. Shoichet,et al.  Virtual screening of chemical libraries , 2004, Nature.

[10]  Jens Sadowski,et al.  Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification , 2003, J. Chem. Inf. Comput. Sci..

[11]  Gokmen Zararsiz,et al.  Drug/nondrug classification using Support Vector Machines with various feature selection strategies , 2014, Comput. Methods Programs Biomed..

[12]  Alex Alves Freitas,et al.  A novel applicability domain technique for mapping predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood , 2016, Journal of Cheminformatics.

[13]  Sereina Riniker,et al.  Open-source platform to benchmark fingerprints for ligand-based virtual screening , 2013, Journal of Cheminformatics.

[14]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[15]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[16]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[17]  Joseph Gomes,et al.  MoleculeNet: a benchmark for molecular machine learning† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02664a , 2017, Chemical science.

[18]  Jens Krüger,et al.  Development of a pharmacorphore model for pharmacological chaperones targeting mutant trafficking-deficient CNG channels , 2013, Journal of Cheminformatics.

[19]  Hongmao Sun A naive bayes classifier for prediction of multidrug resistance reversal activity on the basis of atom typing. , 2005, Journal of medicinal chemistry.

[20]  Bin Li,et al.  Applications of machine learning in drug discovery and development , 2019, Nature Reviews Drug Discovery.

[21]  Zhen Wu,et al.  A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility , 2020, Journal of Cheminformatics.

[22]  Shuang Wang,et al.  Molecule Property Prediction Based on Spatial Graph Embedding , 2019, J. Chem. Inf. Model..

[23]  Sereina Riniker,et al.  Heterogeneous Classifier Fusion for Ligand-Based Virtual Screening: Or, How Decision Making by Committee Can Be a Good Thing , 2013, J. Chem. Inf. Model..

[24]  Günter Klambauer,et al.  DeepTox: Toxicity Prediction using Deep Learning , 2016, Front. Environ. Sci..

[25]  Igor V. Pletnev,et al.  Drug Discovery Using Support Vector Machines. The Case Studies of Drug-likeness, Agrochemical-likeness, and Enzyme Inhibition Predictions , 2003, J. Chem. Inf. Comput. Sci..

[26]  S. Planey,et al.  The influence of lipophilicity in drug discovery and design , 2012, Expert opinion on drug discovery.

[27]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[28]  Seokho Kang,et al.  Compressed graph representation for scalable molecular graph generation , 2020, Journal of Cheminformatics.

[29]  David W. Miller,et al.  Results of a New Classification Algorithm Combining K Nearest Neighbors and Recursive Partitioning , 2001, J. Chem. Inf. Comput. Sci..

[30]  H. Kubinyi,et al.  A scoring scheme for discriminating between drugs and nondrugs. , 1998, Journal of medicinal chemistry.

[31]  Sabrina Jaeger,et al.  Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition , 2018, J. Chem. Inf. Model..