Identification of tertiary structure resemblance in proteins using a maximal common subgraph isomorphism algorithm.

A program called PROTEP is described that permits the rapid comparison of pairs of three-dimensional protein structures to identify the patterns of secondary structure elements that they have in common. The representation of the protein structures as labelled graphs, where the secondary structure elements in a protein and the spatial and angular relationships between them correspond to the nodes and edges of a graph, was developed for use with an earlier program, called POSSUM, which identified subgraph isomorphisms in protein structures. PROTEP takes this representation and uses a different and more flexible approach to locating structural patterns in pairs of proteins, using a maximal common subgraph isomorphism algorithm that is based on a clique detection procedure. A range of searches is described to demonstrate that areas of common structural overlap between protein structures taken from the Protein Data Bank can be identified both effectively and efficiently.