Generalized Fragment-Substructure Based Property Prediction Method

The need for fast and accurate predictors of pharmaceutically important properties has been increasing due to pressure from high-throughput screening, in-silico screening, and the need to more rapidly identify potential pharmacokinetic issues before drugs advance to the more expensive clinical development stages. A novel method for making predictive models based on decomposing 2D structure into component structural fragments is used to model logP, water solubility, and melting point. The fragment orientation of the method facilitates understanding of how molecules might be altered to improve the desired properties. The 2D structure-based descriptor is computed by analysis of the target molecules with a substructure searching algorithm and a set of fragments selected for chemical and pharmaceutical relevance. These are combined with partial least squares to create predictive models. The correlation coefficients achieved are 0.86 for logP (SE = 0.68), 0.73 for logS (SE = 0.89), and 0.64 (SE = 48.9 degrees) for melting point over diverse data sets of 11,447, 2427, and 5598 molecules, respectively. The models were verified via test sets of compounds not included in the training set.