Strategies for structural proteomics of prokaryotes: Quantifying the advantages of studying orthologous proteins and of using both NMR and X‐ray crystallography approaches

Only about half of non‐membrane‐bound proteins encoded by either bacterial or archaeal genomes are soluble when expressed in Escherichia coli (Yee et al., Proc Natl Acad Sci USA 2002;99:1825–1830 ; Christendat et al., Prog Biophys Mol Biol 200;73:339–345) . This property limits genome‐scale functional and structural proteomics studies, which depend on having a recombinant, soluble version of each protein. An emerging strategy to increase the probability of deriving a soluble derivative of a protein is to study different sequence homologues of the same protein, including representatives from thermophilic organisms, based on the assumption that the stability of these proteins will facilitate structural analysis. To estimate the relative merits of this strategy, we compared the recombinant expression, solubility, and suitability for structural analysis by NMR and/or X‐ray crystallography for 68 pairs of homologous proteins from E. coli and Thermotoga maritima. A sample suitable for structural studies was obtained for 62 of the 68 pairs of homologs under standardized growth and purification procedures. Fourteen (eight E. coli and six T. maritima proteins) samples generated NMR spectra of a quality suitable for structure determination and 30 (14 E. coli and 16 T. maritima proteins) samples formed crystals. Only three (one E. coli and two T. maritima proteins) samples both crystallized and had excellent NMR properties. The conclusions from this work are: (1) The inclusion of even a single ortholog of a target protein increases the number of samples for structural studies almost twofold; (2) there was no clear advantage to the use of thermophilic proteins to generate samples for structural studies; and (3) for the small proteins analyzed here, the use of both NMR and crystallography approaches almost doubled the number of samples for structural studies. Proteins 2003;50:392–399. © 2003 Wiley‐Liss, Inc.

[1]  A Sali,et al.  Comparative protein modeling by satisfaction of spatial restraints. , 1996, Molecular medicine today.

[2]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[3]  H Jhoti High-throughput structural proteomics using x-rays. , 2001, Trends in biotechnology.

[4]  Martin Hammarström,et al.  Rapid screening for improved solubility of small human proteins produced as fusion proteins in Escherichia coli , 2002, Protein science : a publication of the Protein Society.

[5]  S. Elledge,et al.  Towards genetic genome projects: genomic library screening and gene-targeting vector construction in a single step , 2002, Nature Genetics.

[6]  T. Ouellet,et al.  Towards genomic and proteomic studies of protein phosphorylation in plant-pathogen interactions. , 2002, Trends in plant science.

[7]  Andrej ⩽ali,et al.  Comparative protein modeling by satisfaction of spatial restraints , 1995 .

[8]  Robert A. Thompson,et al.  Comparative proteomics based on stable isotope labeling and affinity selection. , 2002, Journal of mass spectrometry : JMS.

[9]  W G Krebs,et al.  PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information. , 2001, Nucleic acids research.

[10]  R. Varadarajan,et al.  Elucidation of factors responsible for enhanced thermal stability of proteins: a structural genomics based study. , 2002, Biochemistry.

[11]  R. Simpson,et al.  Cancer proteomics: from signaling networks to tumor markers. , 2001, Trends in biotechnology.

[12]  S. Gygi,et al.  Correlation between Protein and mRNA Abundance in Yeast , 1999, Molecular and Cellular Biology.

[13]  E. Pai,et al.  Crystal Structure of dTDP-4-keto-6-deoxy-d-hexulose 3,5-Epimerase fromMethanobacterium thermoautotrophicum Complexed with dTDP* , 2000, The Journal of Biological Chemistry.

[14]  S. Grant,et al.  Proteomics of multiprotein complexes: answering fundamental questions in neuroscience. , 2001, Trends in biotechnology.

[15]  M Gerstein,et al.  Protein folds in the worm genome. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[16]  D. Wishart,et al.  An NMR approach to structural proteomics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Jérôme Wojcik,et al.  Protein-protein interaction map inference using interacting domain profile pairs , 2001, ISMB.

[18]  Søren Brunak,et al.  A Neural Network Method for Identification of Prokaryotic and Eukaryotic Signal Peptides and Prediction of their Cleavage Sites , 1997, Int. J. Neural Syst..

[19]  D. Eisenberg,et al.  Atomic solvation parameters applied to molecular dynamics of proteins in solution , 1992, Protein science : a publication of the Protein Society.

[20]  Mark Gerstein,et al.  Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22. , 2002, Genome research.

[21]  Mark Gerstein,et al.  Structural proteomics of an archaeon , 2000, Nature Structural Biology.

[22]  Z. Malik,et al.  Nuclear distribution of porphobilinogen deaminase (PBGD) in glioma cells: a regulatory role in cancer transformation? , 2002, British Journal of Cancer.

[23]  G. H. Coombs,et al.  Characterisation of global protein expression by two-dimensional electrophoresis and mass spectrometry: proteomics of Toxoplasma gondii. , 2002, International journal for parasitology.

[24]  M Gerstein,et al.  Structural proteomics: prospects for high throughput sample preparation. , 2000, Progress in biophysics and molecular biology.

[25]  R. Laskowski,et al.  Crystal Structure of Thermotoga maritima 0065, a Member of the IclR Transcriptional Factor Family* , 2002, The Journal of Biological Chemistry.