Using Molecular Fingerprint as Descriptors in the QSPR Study of Lipophilicity

Using SciTegic's extended connectivity fingerprint as raw descriptors, a robust partial least-squares model for logP prediction was developed. The PLS model is based on 39 latent variables. An additional 8 correction factors are employed to account for effects such as intramolecular hydrogen bonding. The model performs similarly to ClogP for compounds with molecular weight in the 250-400 range but significantly better than ClogP for molecules with molecular weight over 400. Considering modern drug discovery tends to generate larger candidate compounds, the PLS model is better suited for drug discovery applications. The good performance of the simple PLS model indicates that the molecular fingerprints encode detailed structure information. When used properly they outperform conventional descriptors in QSPR model development.