Structural genomics is the largest contributor of novel structural leverage

The Protein Structural Initiative (PSI) at the US National Institutes of Health (NIH) is funding four large-scale centers for structural genomics (SG). These centers systematically target many large families without structural coverage, as well as very large families with inadequate structural coverage. Here, we report a few simple metrics that demonstrate how successfully these efforts optimize structural coverage: while the PSI-2 (2005-now) contributed more than 8% of all structures deposited into the PDB, it contributed over 20% of all novel structures (i.e. structures for protein sequences with no structural representative in the PDB on the date of deposition). The structural coverage of the protein universe represented by today’s UniProt (v12.8) has increased linearly from 1992 to 2008; structural genomics has contributed significantly to the maintenance of this growth rate. Success in increasing novel leverage (defined in Liu et al. in Nat Biotechnol 25:849–851, 2007) has resulted from systematic targeting of large families. PSI’s per structure contribution to novel leverage was over 4-fold higher than that for non-PSI structural biology efforts during the past 8 years. If the success of the PSI continues, it may just take another ~15 years to cover most sequences in the current UniProt database.

[1]  C. Fraser-Liggett,et al.  Insights on biology and evolution from microbial genome sequencing. , 2005, Genome research.

[2]  Benjamin J. Raphael,et al.  The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families , 2007, PLoS biology.

[3]  Michael Levitt,et al.  Growth of novel protein structural data , 2007, Proceedings of the National Academy of Sciences.

[4]  Gaetano T Montelione,et al.  Automatic target selection for structural genomics on eukaryotes , 2004, Proteins.

[5]  Burkhard Rost,et al.  CHOP: parsing proteins into structural domains , 2004, Nucleic Acids Res..

[6]  Frances M. G. Pearl,et al.  CATHEDRAL: A Fast and Effective Algorithm to Predict Folds and Domain Boundaries from Multidomain Protein Structures , 2007, PLoS Comput. Biol..

[7]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[8]  Annabel E. Todd,et al.  Target Selection and Determination of Function in Structural Genomics , 2003, IUBMB life.

[9]  David A. Lee,et al.  Progress towards mapping the universe of protein folds , 2004, Genome Biology.

[10]  Roland L Dunbrack,et al.  Outcome of a workshop on archiving structural models of biological macromolecules. , 2006, Structure.

[11]  M. Gerstein,et al.  Structural Genomics: Current Progress , 2003, Science.

[12]  Burkhard Rost,et al.  Domains, motifs and clusters in the protein universe. , 2003, Current opinion in chemical biology.

[13]  Marc A. Martí-Renom,et al.  MODBASE: a database of annotated comparative protein structure models and associated resources , 2005, Nucleic Acids Res..

[14]  Jinfeng Liu,et al.  Novel leverage of structural genomics , 2007, Nature Biotechnology.

[15]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[16]  J. Banfield,et al.  Community structure and metabolism through reconstruction of microbial genomes from the environment , 2004, Nature.

[17]  Adam Zemla,et al.  Critical assessment of methods of protein structure prediction (CASP)‐round V , 2005, Proteins.

[18]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[19]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[20]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[21]  Marc A. Martí-Renom,et al.  EVA: evaluation of protein structure prediction servers , 2003, Nucleic Acids Res..

[22]  András Fiser,et al.  Molecular Biophysics , 2022 .

[23]  Frances M. G. Pearl,et al.  Recognizing the fold of a protein structure , 2003, Bioinform..

[24]  Haruki Nakamura,et al.  The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data , 2006, Nucleic Acids Res..

[25]  Christine A Orengo,et al.  Target selection for structural genomics: an overview. , 2008, Methods in molecular biology.

[26]  Nigel J. Martin,et al.  Gene3D: comprehensive structural and functional annotation of genomes , 2007, Nucleic Acids Res..

[27]  Tong Liu,et al.  The Status of Structural Genomics Defined Through the Analysis of Current Targets and Structures , 2003, Pacific Symposium on Biocomputing.

[28]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction—Round VII , 2007, Proteins.

[29]  Gaetano T Montelione,et al.  Assessing model accuracy using the homology modeling automatically software , 2007, Proteins.

[30]  Torsten Schwede,et al.  The SWISS-MODEL Repository of annotated three-dimensional protein structure homology models , 2004, Nucleic Acids Res..

[31]  Jeremy M Berg,et al.  Update on the protein structure initiative. , 2007, Structure.

[32]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[33]  John D. Westbrook,et al.  TargetDB: a target registration database for structural genomics projects , 2004, Bioinform..

[34]  András Fiser,et al.  Comparative protein structure modeling by combining multiple templates and optimizing sequence-to-structure alignments , 2007, Bioinform..

[35]  Marco Punta,et al.  Structural genomics reveals EVE as a new ASCH/PUA‐related domain , 2009, Proteins.

[36]  S. Brenner,et al.  Implications of structural genomics target selection strategies: Pfam5000, whole genome, and random approaches , 2004, Proteins.

[37]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.