Protein fold comparison by the alignment of topological strings.

Using the definitions of protein folds encoded in a text string, a dynamic programming algorithm was devised to compare these and identify their largest common substructure and calculate the distance (in terms of the number of edit operations) that this lay from each structure. This provided a metric on which the folds were clustered into a 'phylogenetic' tree. This construction differs from previous automatic structure clustering algorithms as it has explicit representation of the structures at 'ancestral' branching nodes, even when these have no corresponding known structure. The resulting tree was compared with that compiled by an 'expert' in the field and while there was broad agreement, differences were found that resulted from differing degrees of emphasis being placed on the types of operations that can be used to transform structures. Some concluding speculations on the relationship of such trees to the evolutionary history and folding of the proteins are advanced.