Protein structure prediction constrained by solution X-ray scattering data and structural homology identification.

Here we perform a systematic exploration of the use of distance constraints derived from small angle X-ray scattering (SAXS) measurements to filter candidate protein structures for the purpose of protein structure prediction. This is an intrinsically more complex task than that of applying distance constraints derived from NMR data where the identity of the pair of amino acid residues subject to a given distance constraint is known. SAXS, on the other hand, yields a histogram of pair distances (pair distribution function), but the identities of the pairs contributing to a given bin of the histogram are not known. Our study is based on an extension of the Levitt-Hinds coarse grained approach to ab initio protein structure prediction to generate a candidate set of C(alpha) backbones. In spite of the lack of specific residue information inherent in the SAXS data, our study shows that the implementation of a SAXS filter is capable of effectively purifying the set of native structure candidates and thus provides a substantial improvement in the reliability of protein structure prediction. We test the quality of our predicted C(alpha) backbones by doing structural homology searches against the Dali domain library, and find that the results are very encouraging. In spite of the lack of local structural details and limited modeling accuracy at the C(alpha) backbone level, we find that useful information about fold classification can be extracted from this procedure. This approach thus provides a way to use a SAXS data based structure prediction algorithm to generate potential structural homologies in cases where lack of sequence homology prevents identification of candidate folds for a given protein. Thus our approach has the potential to help in determination of the biological function of a protein based on structural homology instead of sequence homology.

[1]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[2]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[3]  D I Svergun,et al.  Determination of domain structure of proteins from X-ray solution scattering. , 2001, Biophysical journal.

[4]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[5]  R Samudrala,et al.  Ab initio construction of protein tertiary structures using a hierarchical approach. , 2000, Journal of molecular biology.

[6]  D. Baker,et al.  Prospects for ab initio protein structural genomics. , 2001, Journal of molecular biology.

[7]  Sebastian Doniach,et al.  Reconstruction of low-resolution three-dimensional density maps from one-dimensional small-angle X-ray solution scattering data for biomolecules , 2000 .

[8]  Stephen K. Burley,et al.  An overview of structural genomics , 2000, Nature Structural Biology.

[9]  M. Levitt,et al.  Exploring conformational space with a simple lattice model for protein structure. , 1994, Journal of molecular biology.

[10]  J. Skolnick,et al.  Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. , 1998, Journal of molecular biology.

[11]  Jacquelyn S. Fetrow,et al.  Structural genomics and its importance for gene function analysis , 2000, Nature Biotechnology.

[12]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[13]  A. Mclachlan Tests for comparing related amino-acid sequences. Cytochrome c and cytochrome c 551 . , 1971, Journal of molecular biology.

[14]  D. Svergun,et al.  CRYSOL : a program to evaluate X-ray solution scattering of biological macromolecules from atomic coordinates , 1995 .

[15]  P. Koehl,et al.  Application of a self-consistent mean field theory to predict protein side-chains conformation and estimate their conformational entropy. , 1994, Journal of molecular biology.

[16]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[17]  D. Baker,et al.  De novo protein structure determination using sparse NMR data , 2000, Journal of biomolecular NMR.

[18]  S. Rackovsky Quantitative organization of the known protein x‐ray structures. I. Methods and short‐length‐scale results , 1990, Proteins.

[19]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[20]  M. Sternberg,et al.  On the prediction of protein structure: The significance of the root-mean-square deviation. , 1980, Journal of molecular biology.

[21]  Ram Samudrala,et al.  Ab initio protein structure prediction using a combined hierarchical approach , 1999, Proteins.

[22]  C Sander,et al.  Dictionary of recurrent domains in protein structures , 1998, Proteins.

[23]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[24]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[25]  Annabel E. Todd,et al.  From structure to function: Approaches and limitations , 2000, Nature Structural Biology.

[26]  E. Huang,et al.  Are predicted structures good enough to preserve functional sites? , 1999, Structure.

[27]  Richard Bonneau,et al.  Ab initio protein structure prediction of CASP III targets using ROSETTA , 1999, Proteins.

[28]  M J Sternberg,et al.  Automated discovery of structural signatures of protein fold and function. , 2001, Journal of molecular biology.

[29]  M Levitt,et al.  Recognizing native folds by the arrangement of hydrophobic and polar residues. , 1995, Journal of molecular biology.

[30]  S. Subbiah,et al.  Prediction of protein side-chain conformation by packing optimization. , 1991, Journal of molecular biology.

[31]  W A Koppensteiner,et al.  An attempt to analyse progress in fold recognition from CASP1 to CASP3 , 1999, Proteins.

[32]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[33]  William A. Goddard,et al.  PROTEIN FOLD DETERMINATION FROM SPARSE DISTANCE RESTRAINTS : THE RESTRAINED GENERIC PROTEIN DIRECT MONTE CARLO METHOD , 1999 .

[34]  J. Skolnick,et al.  MONSSTER: a method for folding globular proteins with a small number of distance restraints. , 1997, Journal of molecular biology.

[35]  Michael Levitt,et al.  A brighter future for protein structure prediction , 1999, Nature Structural Biology.

[36]  C Venclovas,et al.  Processing and analysis of CASP3 protein structure predictions , 1999, Proteins.