Knowledge based modelling of homologous proteins, Part II: Rules for the conformations of substituted sidechains.

This paper describes a rapid, automated procedure which can be used for model building sidechains using (i) spatial information from sidechains in topologically equivalent positions as far as such a correlation is observed, and then (ii) most probable conformations of the sidechains in the respective secondary structure type. Analysis of topologically equivalent residues in the structurally conserved regions of a family of proteins implies that the spatial positions of the atoms in the sidechains rather than conformations should be considered when model building. Rules for the modelling of all 20 side-chains from each other in alpha-helical, beta-sheet and loop regions--a total of 1200--are established. Cluster analysis is used on positional data from the sidechain atoms of structurally equivalent residues in an homologous family to guide modelling. The most probable conformation for the sidechain is used for modelling atoms where no useful guidance is obtainable from equivalent sidechains of the homologous proteins. In order to test the procedure we have modelled the sidechains of the residues in the structurally conserved regions of myoglobin from four other globins. The automated procedure described here has been incorporated into the program COMPOSER.