Development of R-Group Fingerprints Based on the Local Landscape from an Attachment Point of a Molecular Structure

Molecular fingerprints are indispensable in medicinal chemistry for quantifying chemical structures. Fingerprints can be calculated for substructures with attachment points, which are positions where a substructure and a corresponding core structure connect. Because structures with attachment points can be crucial for understanding structure-activity relationships, fingerprints specialized for representing this structural feature are required. R-group fingerprints and R-group descriptors were proposed previously for this purpose; however, these molecular representations have limitations. Current R-group fingerprints do not emphasize information about attachment points, and R-group descriptors are too sensitive to changes in the topological path length from an attachment point. In the present work, we developed novel R-group fingerprints, termed R-path fingerprints, which contain substituent information from an attachment point without being sensitive to small differences in topological distances. The concept of the R-path fingerprints is to describe a chemical substructure from the viewpoint of an attachment point, to distinguish atomistic information around the attachment point and other parts of the substructure. This was achieved by considering all the paths on the shortest path between the attachment point and each atom in a substituent. Benchmark testing was conducted, including comparisons of similarity distributions and potency prediction for R-group substituents. The results showed that R-path fingerprints should be useful for classifying and comparing substructures with attachment points.