Structuring the universe of proteins.

High-throughput sequencing of human genomes and those of important model organisms (mouse, Drosophila melanogaster, Caenorhabditis elegans, fungi, archaea) and bacterial pathogens has laid the foundation for another "big science" initiative in biology. Together, X-ray crystallographers, nuclear magnetic resonance (NMR) spectroscopists, and computational biologists are pursuing high-throughput structural studies aimed at developing a comprehensive three-dimensional view of the protein structure universe. The new science of structural genomics promises more than 10,000 experimental protein structures and millions of calculated homology models of related proteins. The evolutionary underpinnings and technological challenges of automating target selection, protein expression and purification, sample preparation, NMR and X-ray data measurement/analysis, homology modeling, and structure/function annotation are discussed in detail. An informative case study from one of the structural genomics centers funded by the National Institutes of Health and the National Institute of General Medical Sciences (NIH/NIGMS) demonstrates how this experimental/computational pipeline will reveal important links between form and function in biology and provide new insights into evolution and human health and disease.

[1]  N H Horowitz,et al.  On the Evolution of Biochemical Syntheses. , 1945, Proceedings of the National Academy of Sciences of the United States of America.

[2]  C. Carter Protein crystallization using incomplete factorial experiments. , 1979, The Journal of biological chemistry.

[3]  K. Sharp,et al.  Calculating the electrostatic potential of molecules in solution: Method and error assessment , 1988 .

[4]  J. Goldstein,et al.  Regulation of the mevalonate pathway , 1990, Nature.

[5]  Sung-Hou Kim,et al.  Sparse matrix sampling: a screening method for crystallization of proteins , 1991 .

[6]  W. Hendrickson Determination of macromolecular structures from anomalous diffraction of synchrotron radiation. , 1991, Science.

[7]  Rainer Fuchs,et al.  CLUSTAL V: improved software for multiple sequence alignment , 1992, Comput. Appl. Biosci..

[8]  J. Thornton,et al.  PROCHECK: a program to check the stereochemical quality of protein structures , 1993 .

[9]  P. Bork,et al.  Fibronectin type III modules in the receptor phosphatase CD45 and tapeworm antigens , 1993, Protein science : a publication of the Protein Society.

[10]  C. Sander,et al.  Convergent evolution of similar enzymatic function on different protein folds: The hexokinase, ribokinase, and galactokinase families of sugar kinases , 1993, Protein science : a publication of the Protein Society.

[11]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[12]  S. Cohen Domain elucidation by mass spectrometry. , 1996, Structure.

[13]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[14]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[15]  R. Riek,et al.  Attenuated T2 relaxation by mutual cancellation of dipole-dipole coupling and chemical shift anisotropy indicates an avenue to NMR structures of very large biological macromolecules in solution. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[16]  C. Chothia,et al.  Population statistics of protein structures: lessons from structural classifications. , 1997, Current opinion in structural biology.

[17]  V. N. Molchanov,et al.  Superconducting Single Crystals of Tl2Ba2CaCu2O8 and YBa2Cu4O8: Crystal Structures in the Vicinity of Tc , 1998 .

[18]  A. Sali,et al.  Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[19]  R J Read,et al.  Crystallography & NMR system: A new software suite for macromolecular structure determination. , 1998, Acta crystallographica. Section D, Biological crystallography.

[20]  S J Wodak,et al.  SFCHECK: a unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model. , 1999, Acta crystallographica. Section D, Biological crystallography.

[21]  James E. Bray,et al.  The CATH Database provides insights into protein structure/function relationships , 1999, Nucleic Acids Res..

[22]  M A Walsh,et al.  Taking MAD to the extreme: ultrafast protein structure determination. , 1999, Acta crystallographica. Section D, Biological crystallography.

[23]  A. Sali,et al.  Structural genomics: beyond the Human Genome Project , 1999, Nature Genetics.

[24]  Steven E. Brenner,et al.  The PRESAGE database for structural genomics , 1999, Nucleic Acids Res..

[25]  Wim G. J. Hol,et al.  Structural genomics for science and society , 2000, Nature Structural Biology.

[26]  M Linial,et al.  Methodologies for target selection in structural genomics. , 2000, Progress in biophysics and molecular biology.

[27]  Jacquelyn S. Fetrow,et al.  Structural genomics and its importance for gene function analysis , 2000, Nature Biotechnology.

[28]  John C. Norvell,et al.  Structural genomics programs at the US National Institute of General Medical Sciences , 2000, Nature Structural Biology.

[29]  Yutaka Kuroda,et al.  Structural genomics projects in Japan , 2000, Nature Structural Biology.

[30]  T. Steitz,et al.  The complete atomic structure of the large ribosomal subunit at 2.4 A resolution. , 2000, Science.

[31]  R C Stevens,et al.  High-throughput protein crystallization. , 2000, Current opinion in structural biology.

[32]  B. M. Lange,et al.  Isoprenoid biosynthesis: the evolution of two ancient and distinct pathways across genomes. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[33]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[34]  Thomas C. Terwilliger,et al.  Structural genomics in North America , 2000, Nature Structural Biology.

[35]  J. Reichardt,et al.  Molecular basis of disorders of human galactose metabolism: past, present, and future. , 2000, Molecular genetics and metabolism.

[36]  Cheryl H. Arrowsmith,et al.  Protein production: feeding the crystallographers and NMR spectroscopists , 2000, Nature Structural Biology.

[37]  U Heinemann,et al.  An integrated approach to structural genomics. , 2000, Progress in biophysics and molecular biology.

[38]  Udo Heinemann,et al.  Structural genomics in Europe: Slow start, strong finish? , 2000, Nature Structural Biology.

[39]  T L Blundell,et al.  Structural genomics: an overview. , 2000, Progress in biophysics and molecular biology.

[40]  R. Stevens,et al.  Combining structural genomics and enzymology: completing the picture in metabolic pathways and enzyme active sites. , 2000, Current opinion in structural biology.

[41]  N. Grishin,et al.  Structure and mechanism of homoserine kinase: prototype for the GHMP kinase superfamily. , 2000, Structure.

[42]  Thomas Earnest,et al.  Automation of X-ray crystallography , 2000, Nature Structural Biology.

[43]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[44]  J Moult,et al.  From fold to function. , 2000, Current opinion in structural biology.

[45]  Anastassis Perrakis,et al.  Current state of automated crystallographic data analysis , 2000, Nature Structural Biology.

[46]  Sung-Hou Kim,et al.  tructural genomics of microbes: an objective , 2000 .

[47]  S. Brenner,et al.  Expectations from structural genomics , 2008, Protein science : a publication of the Protein Society.

[48]  C. DeLisi,et al.  Protein folds: molecular systematics in three dimensions , 2001, Cellular and Molecular Life Sciences CMLS.

[49]  Chris Sander,et al.  Completeness in structural genomics , 2001, Nature Structural Biology.

[50]  C. Chothia,et al.  Determination of protein function, evolution and interactions by structural genomics. , 2001, Current opinion in structural biology.

[51]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[52]  Michael Lappe,et al.  A fully automatic evolutionary classification of protein folds: Dali Domain Dictionary version 3 , 2001, Nucleic Acids Res..

[53]  A Sali,et al.  Structural genomics of enzymes involved in sterol/isoprenoid biosynthesis , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[54]  W. Eisenreich,et al.  Deoxyxylulose phosphate pathway to terpenoids. , 2001, Trends in plant science.

[55]  Mark Gerstein,et al.  SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics , 2001, Nucleic Acids Res..

[56]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[57]  M. Grütter,et al.  Structural genomics: opportunities and challenges. , 2001, Current opinion in chemical biology.

[58]  J. Beckwith,et al.  Conversion of a Peroxiredoxin into a Disulfide Reductase by a Triplet Repeat Expansion , 2001, Science.

[59]  Veronika Vonstein,et al.  Archaeal Shikimate Kinase, a New Member of the GHMP-Kinase Family , 2001, Journal of bacteriology.

[60]  B. Rost,et al.  Comparing function and structure between entire proteomes , 2001, Protein science : a publication of the Protein Society.

[61]  Liisa Holm,et al.  Identification of homology in protein structure classification , 2001, Nature Structural Biology.

[62]  M. Graninger,et al.  Biosynthesis of Nucleotide-activatedd-glycero-d-manno-Heptose* , 2001, The Journal of Biological Chemistry.

[63]  Structure, function and evolution of glutathione transferases: implications for classification of non-mammalian members of an ancient enzyme superfamily. , 2001 .

[64]  M. Vidal,et al.  Structural genomics: A pipeline for providing structures for the biologist , 2002, Protein science : a publication of the Protein Society.

[65]  S. Burley,et al.  Crystal structure of the Streptococcus pneumoniae phosphomevalonate kinase, a member of the GHMP kinase superfamily , 2002, Proteins.

[66]  Narayanan Eswar,et al.  MODBASE, a database of annotated comparative protein structure models , 2002, Nucleic Acids Res..

[67]  Frances M. G. Pearl,et al.  The CATH protein family database: A resource for structural and functional annotation of genomes , 2002, Proteomics.

[68]  Tim J. P. Hubbard,et al.  SCOP database in 2002: refinements accommodate structural genomics , 2002, Nucleic Acids Res..