Datamining Protein Structure Databanks for Crystallization Patterns of Proteins

Abstract: A study of 345 protein structures selected among 1,500 structures determined by nuclear magnetic resonance (NMR) methods, revealed useful correlations between crystallization properties and several parameters for the studied proteins. NMR methods of structure determination do not require the growth of protein crystals, and hence allow comparison of properties of proteins that have or have not been the subject of crystallographic approaches. One‐ and two‐dimensional statistical analyses of the data confirmed a hypothesized relation between the size of the molecule and its crystallization potential. Furthermore, two‐dimensional Bayesian analysis revealed a significant relationship between relative ratio of different secondary structures and the likelihood of success for crystallization trials. The most immediate result is an apparent correlation of crystallization potential with protein size. Further analysis of the data revealed a relationship between the unstructured fraction of proteins and the success of its crystallization. Utilization of Bayesian analysis on the latter correlation resulted in a prediction performance of about 64%, whereas a two‐dimensional Bayesian analysis succeeded with a performance of about 75%.

[1]  J H Prestegard,et al.  Nuclear magnetic resonance in the era of structural genomics. , 2001, Biochemistry.

[2]  J. Skolnick,et al.  Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. , 1998, Journal of molecular biology.

[3]  J. Sodroski,et al.  Probability Analysis of Variational Crystallization and Its Application to gp120, The Exterior Envelope Glycoprotein of Type 1 Human Immunodeficiency Virus (HIV-1)* , 1999, The Journal of Biological Chemistry.

[4]  Faramarz Valafar Techniques in bioinformatics and medical informatics , 2002 .

[5]  P. V. van Zijl,et al.  Water exchange filter with improved sensitivity (WEX II) to study solvent-exchangeable protons. Application to the consensus zinc finger peptide CP-1. , 1996, Journal of magnetic resonance. Series B.

[6]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[7]  Leszek Rychlewski,et al.  Fold prediction by a hierarchy of sequence, threading, and modeling methods , 1998, Protein science : a publication of the Protein Society.

[8]  S. Brenner,et al.  Expectations from structural genomics , 2008, Protein science : a publication of the Protein Society.

[9]  C. Orengo,et al.  From protein structure to function. , 1999, Current opinion in structural biology.

[10]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[11]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[12]  J. Skolnick,et al.  From genes to protein structure and function: novel applications of computational approaches in the genomic era. , 2000, Trends in biotechnology.