On the existence and implications of an inverse folding code in proteins.

The existence of a code relating the set of possible sequences at a given position in a protein backbone to the local structure at that location is investigated. It is shown that only 73% of 4-C alpha structure fragments in a sample of 114 protein structures exhibit a preference for a particular set of sequences. The remaining structures can accommodate essentially any sequence. The structures that encode specific sequence distributions include the classical "secondary" structures, with the notable exception of planar (beta) bends. It is suggested that this has implications as to the mechanism of folding in proteins with extensive sheet/barrel structure. The possible role of structures that do not encode specific sequences as mutation hot spots is noted.