Identification of Potential Small Molecule Peptidomimetics Similar to Motifs in Proteins

Protein-protein interactions are central to most biological processes and represent a large and important class of targets for human therapeutics. Small molecules containing peptide substituents may mimic regions of interacting proteins and inhibit their interactions. We set out to develop efficient methods to screen for similarities between known peptide structures within proteins and small molecules. We developed a method to rank peptide-compound similarities, that is restricted to small linear motifs in proteins, and to compounds containing amino acid substituents. Application to a search of the PubChem database (5.4 million compounds) using all short motifs on accessible surface areas in a nonredundant set of 11 488 peptides from the protein structure database PDB demonstrated the feasibility of the method for high throughput comparisons and the availability of compounds with comparable substituents: over 6 million compound-peptide pairs shared at least three amino acid substituents, approximately 100 000 of which had an rmsd score of less than 1 A. A Z-score function was developed that compares matches of a compound to different instances of the peptide motif in PDB, providing an appropriate scoring function for comparison among peptide-compound similarities involving different numbers of atoms (while simultaneously enriching for similarities that are likely to be more specific for the protein of interest). We applied the method to searches of known short protein motifs against the National Cancer Institute Developmental Therapeutic Program compound database, identifying a known true positive.