Structural genomics: keeping up with expanding knowledge of the protein universe.

Structural characterization of the protein universe is the main mission of Structural Genomics (SG) programs. However, progress in gene sequencing technology, set in motion in the 1990s, has resulted in rapid expansion of protein sequence space--a twelvefold increase in the past seven years. For the SG field, this creates new challenges and necessitates a re-assessment of its strategies. Nevertheless, despite the growth of sequence space, at present nearly half of the content of the Swiss-Prot database and over 40% of Pfam protein families can be structurally modeled based on structures determined so far, with SG projects making an increasingly significant contribution. The SG contribution of new Pfam structures nearly doubled from 27.2% in 2003 to 51.6% in 2006.

[1]  Chris Sander,et al.  Completeness in structural genomics , 2001, Nature Structural Biology.

[2]  Raymond C Stevens,et al.  Long live structural biology , 2004, Nature Structural &Molecular Biology.

[3]  David A. Lee,et al.  Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space , 2006, Nucleic acids research.

[4]  C. Orengo,et al.  Protein families and their evolution-a structural perspective. , 2005, Annual review of biochemistry.

[5]  John D. Westbrook,et al.  TargetDB: a target registration database for structural genomics projects , 2004, Bioinform..

[6]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Benjamin J. Raphael,et al.  The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families , 2007, PLoS biology.

[8]  S. Brenner,et al.  Implications of structural genomics target selection strategies: Pfam5000, whole genome, and random approaches , 2004, Proteins.

[9]  Zbigniew Dauter,et al.  Current state and prospects of macromolecular crystallography. , 2006, Acta crystallographica. Section D, Biological crystallography.

[10]  Wladek Minor,et al.  HKL-3000: the integration of data reduction and structure solution--from diffraction images to an initial model in minutes. , 2006, Acta crystallographica. Section D, Biological crystallography.

[11]  Sung-Hou Kim,et al.  Global mapping of the protein structure space and application in structure-based inference of protein function. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[12]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[13]  Miroslaw Cygler,et al.  Coverage of protein sequence space by current structural genomics targets , 2004, Journal of Structural and Functional Genomics.

[14]  David A. Lee,et al.  Progress towards mapping the universe of protein folds , 2004, Genome Biology.

[15]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[16]  Steven E Brenner,et al.  The Impact of Structural Genomics: Expectations and Outcomes , 2005, Science.

[17]  Miroslaw Cygler,et al.  The structural genomics experimental pipeline: Insights from global target lists , 2004, Proteins.

[18]  G. Scapin,et al.  Structural biology and drug discovery. , 2006, Current pharmaceutical design.

[19]  Michael Levitt,et al.  Growth of novel protein structural data , 2007, Proceedings of the National Academy of Sciences.

[20]  Sung-Hou Kim,et al.  A global representation of the protein fold space , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[22]  E. Koonin,et al.  The structure of the protein universe and genome evolution , 2002, Nature.

[23]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[24]  John Moult,et al.  A unifold, mesofold, and superfold model of protein fold use , 2002, Proteins.

[25]  Maria Jesus Martin,et al.  High-quality Protein Knowledge Resource: SWISS-PROT and TrEMBL , 2002, Briefings Bioinform..

[26]  Gwyndaf Evans,et al.  The Structural Biology Center 19ID undulator beamline: facility specifications and protein crystallographic results. , 2006, Journal of synchrotron radiation.

[27]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[28]  John Moult,et al.  Protein family clustering for structural genomics. , 2005, Journal of molecular biology.