Molecular hashkeys: a novel method for molecular characterization and its application for predicting important pharmaceutical properties of molecules.

We define a novel numerical molecular representation, called the molecular hashkey, that captures sufficient information about a molecule to predict pharmaceutically interesting properties directly from three-dimensional molecular structure. The molecular hashkey represents molecular surface properties as a linear array of pairwise surface-based comparisons of the target molecule against a common 'basis-set' of molecules. Hashkey-measured molecular similarity correlates well with direct methods of measuring molecular surface similarity. Using a simple machine-learning technique with the molecular hashkeys, we show that it is possible to accurately predict the octanol-water partition coefficient, log P. Using more sophisticated learning techniques, we show that an accurate model of intestinal absorption for a set of drugs can be constructed using the same hashkeys used in the aforementioned experiments. Once a set of molecular hashkeys is calculated, its use in the training and testing of property-based models is very fast. Further, the required amount of data for model construction is very small. Neural network-based hashkey models trained on data sets as small as 30 molecules yield statistically significant prediction of molecular properties. The lack of a requirement for large data sets lends itself well to the prediction of pharmaceutically relevant molecular parameters for which data generation is expensive and slow. Molecular hashkeys coupled with machine-learning techniques can yield models that predict key pharmacological aspects of biologically important molecules and should therefore be important in the design of effective therapeutics.