Predicting Small Molecule Transfer Free Energies by Combining Molecular Dynamics Simulations and Deep Learning

Accurately predicting small molecule partitioning and hydrophobicity is critical in the drug discovery process. There are many heterogeneous chemical environments within a cell and entire human body. For example, drugs must be able to cross the hydrophobic cellular membrane to reach their intracellular targets and hydrophobicity is an important driving force for drug-protein binding. Atomistic molecular dynamics (MD) simulations are routinely used to calculate free energies of small molecules binding to proteins, crossing lipid membranes, and solvation but are computationally expensive. Machine learning (ML) and empirical methods are also used throughout drug discovery, but rely on experimental data, limiting the domain of applicability. Here, we run extensive atomistic MD simulations to calculate 15,000 small molecule free energies of transfer, to train multiple ML models that predict the free energy. Simulations are used to determine the relative free energy for the small molecules in water, at an interface, and in bulk hydrocarbon. ML models were then trained and tested on our MD free energies. We show that a spatial graph neural network model achieves the highest accuracy, followed closely by a 3D-convolutional neural network and shallow learning based on the chemical fingerprint is significantly less accurate. A mean absolute error of ~4 kJ/mol compared to the MD calculations was achieved for our best ML model. We also show that including data from the MD simulation improves the predictions, test the transferability of each model to a diverse set of molecules, and show multi-task learning improves the predictions. This work provides insight into the hydrophobicity of small molecules and ML cheminformatics modelling, and our dataset will be useful for designing and testing future ML cheminformatics methods.