The importance of alignment accuracy for molecular replacement.

Many crystallographic protein structures are being determined using molecular replacement (MR), a model-based phasing method that has become increasingly important with the steady growth of the PDB. While there are several highly automated software packages for MR, the methods for preparing optimal search models for MR are relatively unexplored. Recent advances in sequence-comparison methods allow the detection of more distantly related homologs and more accurate alignment of their sequences. It was investigated whether simple homology models (without modeling of unaligned regions) based on alignments from these improved methods are able to increase the potential of MR. 27 crystal structures were determined using a highly parallelized MR pipeline that facilitates all steps including homology detection, model preparation, MR searches, automated refinement and rebuilding. Several types of search models prepared with standard sequence-sequence alignment (BLAST) and more accurate profile-sequence and profile-profile methods (PSI-BLAST, FFAS) were compared in MR trials. The analysis shows that models based on more accurate alignments have a higher success rate in cases where the unknown structure and the search model share less than 35% sequence identity. It is concluded that by using different types of simple models based on accurate alignments, the success rate of MR can be significantly increased.

[1]  C. Sander,et al.  Dali: a network tool for protein structure comparison. , 1995, Trends in biochemical sciences.

[2]  Bradley E. Bernstein,et al.  Synergistic effects of substrate-induced conformational changes in phosphoglycerate kinase activation , 1997, Nature.

[3]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[4]  D. Blow,et al.  The detection of sub‐units within the crystallographic asymmetric unit , 1962 .

[5]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[6]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[7]  A. Godzik,et al.  Sequence clustering strategies improve remote homology recognitions while reducing search times. , 2002, Protein engineering.

[8]  M. Karplus,et al.  Evaluation of comparative protein modeling by MODELLER , 1995, Proteins.

[9]  J. Zou,et al.  Improved methods for building protein models in electron density maps and the location of errors in these models. , 1991, Acta crystallographica. Section A, Foundations of crystallography.

[10]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[11]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[12]  John C. Wootton,et al.  Statistics of Local Complexity in Amino Acid Sequences and Sequence Databases , 1993, Comput. Chem..