Fast and accurate prediction of partial charges using Atom-Path-Descriptor-based machine learning

MOTIVATION Partial atomic charges are usually used to calculate the electrostatic component of energy in many molecular modeling applications, such as molecular docking, molecular dynamics simulations, free energy calculations, etc. High-level quantum mechanics calculations may provide the most accurate way to estimate the partial charges for small molecules, but they are too time-consuming to be used to process a large number of molecules for high throughput virtual screening. RESULTS We proposed a new molecule descriptor named Atom Path Descriptor (APD) and developed a set of APD-based machine learning (ML) models to predict the partial charges for small molecules with high accuracy. In the APD algorithm, the 3D structures of molecules were assigned with atom centers and atom-pair path-based atom layers to characterize the local chemical environments of atoms. Then, based on the APDs, two representative ensemble ML algorithms, i.e., random forest (RF) and extreme gradient boosting (XGBoost), were employed to develop the regression models for partial charge assignment. The results illustrate that the RF models based on APDs give better predictions for all the atom types than those based on traditional molecular fingerprints reported in the previous study. More encouragingly, the models trained by XGBoost can improve the predictions of partial charges further, and they can achieve the average root-mean-square error (RMSE) 0.0116 e on the external test set, which is much lower than that (0.0195 e) reported in the previous study, suggesting that the proposed algorithm is quite promising to be used in partial charge assignment with high accuracy.

[1]  Brajesh K. Rai,et al.  Fast and accurate generation of ab initio quality atomic charges using nonparametric statistical regression , 2013, J. Comput. Chem..

[2]  Chao Shen,et al.  ADMET Evaluation in Drug Discovery. 19. Reliable Prediction of Human Cytochrome P450 Inhibition Using Artificial Intelligence Approaches , 2019, J. Chem. Inf. Model..

[3]  Sereina Riniker,et al.  Machine Learning of Partial Charges Derived from High-Quality Quantum-Mechanical Calculations , 2018, J. Chem. Inf. Model..

[4]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[5]  Peter Gedeck,et al.  QSAR - How Good Is It in Practice? Comparison of Descriptor Sets on an Unbiased Cross Section of Corporate Data Sets , 2006, J. Chem. Inf. Model..

[6]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[7]  Donald G Truhlar,et al.  Charge Model 5: An Extension of Hirshfeld Population Analysis for the Accurate Description of Molecular Interactions in Gaseous and Condensed Phases. , 2012, Journal of chemical theory and computation.

[8]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[9]  Kipton Barros,et al.  Discovering a Transferable Charge Assignment Model Using Machine Learning. , 2018, The journal of physical chemistry letters.

[10]  F. Weinhold,et al.  Natural population analysis , 1985 .

[11]  P. Kollman,et al.  A well-behaved electrostatic potential-based method using charge restraints for deriving atomic char , 1993 .

[12]  Christopher I. Bayly,et al.  Fast, efficient generation of high‐quality atomic charges. AM1‐BCC model: II. Parameterization and validation , 2002, J. Comput. Chem..

[13]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[14]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[15]  P. Kollman,et al.  Atomic charges derived from semiempirical methods , 1990 .

[16]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[17]  F. L. Hirshfeld Bonded-atom fragments for describing molecular charge densities , 1977 .

[18]  Youyong Li,et al.  Assessing the performance of MM/PBSA and MM/GBSA methods. 3. The impact of force fields and ligand charge models. , 2013, The journal of physical chemistry. B.

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  P. Kollman,et al.  An approach to computing electrostatic charges for molecules , 1984 .

[21]  Dominik Heider,et al.  ContraDRG: Automatic Partial Charge Prediction by Machine Learning , 2019, Front. Genet..

[22]  J. Gasteiger,et al.  ITERATIVE PARTIAL EQUALIZATION OF ORBITAL ELECTRONEGATIVITY – A RAPID ACCESS TO ATOMIC CHARGES , 1980 .

[23]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[24]  Justin S. Smith,et al.  Hierarchical modeling of molecular energies using a deep neural network. , 2017, The Journal of chemical physics.