An approach to protein homology modelling based on an ensemble of NMR structures: application to the Sox-5 HMG-box protein.

A new approach has been developed to reduce multiple protein structures obtained from NMR structure analysis to a smaller number of representative structures which still reflect the structural diversity of the data sets. The method, based on the clustering of similar structures, has been tested in the homology model building of the structure of Sox-5, a sequence-specific DNA-binding protein belonging to the high mobility group (HMG) nuclear proteins family. Sox (SRY box) genes are the autosomal genes related to the sex-determining SRY, Y chromosomal gene. The Sox-5 protein, encoded by one of the SRY-related genes, displays a 29% sequence identity with the HMG1 B-box domain whose structure, determined previously by NMR, has been used in our study to predict the structure of Sox-5. Two independent ensembles of HMG1 structures, each represented by closely related coordinate sets, were used. Nine representative structures for HMG1 were subsequently selected as starting points for the modelling of Sox-5. The model of the protein shows close similarity to the HMG1 fold, with differences at the secondary structure level located mainly in alpha-helices 1 and 3. A left-handed, three residue per turn polyproline II helix, forming a conserved polyproline II/alpha-helix supersecondary motif, was identified in the N-terminal region of Sox-5 and other HMG boxes.