Protein structure determination by exhaustive search of Protein Data Bank derived databases

Parallel sequence and structure alignment tools have become ubiquitous and invaluable at all levels in the study of biological systems. We demonstrate the application and utility of this same parallel search paradigm to the process of protein structure determination, benefitting from the large and growing corpus of known structures. Such searches were previously computationally intractable. Through the method of Wide Search Molecular Replacement, developed here, they can be completed in a few hours with the aide of national-scale federated cyberinfrastructure. By dramatically expanding the range of models considered for structure determination, we show that small (less than 12% structural coverage) and low sequence identity (less than 20% identity) template structures can be identified through multidimensional template scoring metrics and used for structure determination. Many new macromolecular complexes can benefit significantly from such a technique due to the lack of known homologous protein folds or sequences. We demonstrate the effectiveness of the method by determining the structure of a full-length p97 homologue from Trichoplusia ni. Example cases with the MHC/T-cell receptor complex and the EmoB protein provide systematic estimates of minimum sequence identity, structure coverage, and structural similarity required for this method to succeed. We describe how this structure-search approach and other novel computationally intensive workflows are made tractable through integration with the US national computational cyberinfrastructure, allowing, for example, rapid processing of the entire Structural Classification of Proteins protein fragment database.

[1]  D. Blow,et al.  The detection of sub‐units within the crystallographic asymmetric unit , 1962 .

[2]  R. Read Improved Fourier Coefficients for Maps Using Phases from Partial Structures with Errors , 1986 .

[3]  J. Zou,et al.  Improved methods for building protein models in electron density maps and the location of errors in these models. , 1991, Acta crystallographica. Section A, Foundations of crystallography.

[4]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[5]  Probing the limits of the molecular replacement method: the case of Trypanosoma brucei phosphoglycerate kinase. , 1997, Acta crystallographica. Section D, Biological crystallography.

[6]  L. Lally The CCP 4 Suite — Computer programs for protein crystallography , 1998 .

[7]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[8]  Ad Bax,et al.  Protein Structure Determination Using Molecular Fragment Replacement and NMR Dipolar Couplings , 2000 .

[9]  D. T. Jones,et al.  Evaluating the potential of using fold-recognition models for molecular replacement. , 2001, Acta crystallographica. Section D, Biological crystallography.

[10]  Francine Berman,et al.  Overview of the Book: Grid Computing – Making the Global Infrastructure a Reality , 2003 .

[11]  Jianpeng Ma,et al.  The crystal structure of murine p97/VCP at 3.6A. , 2003, Journal of structural biology.

[12]  Miron Livny,et al.  Condor and the Grid , 2003 .

[13]  B. Delabarre,et al.  Complete structure of p97/valosin-containing protein reveals communication between nucleotide domains , 2003, Nature Structural Biology.

[14]  Andrew McNab Grid-based access control for Unix environments, Filesystems and Web Sites , 2003, ArXiv.

[15]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[16]  Ian T. Foster Globus Toolkit Version 4: Software for Service-Oriented Systems , 2005, NPC.

[17]  I. Sfiligoi,et al.  Making science in the Grid world: using glideins to maximize scientific output , 2007, 2007 IEEE Nuclear Science Symposium Conference Record.

[18]  Paul Avery,et al.  The Open Science Grid , 2007 .

[19]  Randy J. Read,et al.  Phaser crystallographic software , 2007, Journal of applied crystallography.

[20]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[21]  W. Weis,et al.  Improved structures of full-length p97, an AAA ATPase: implications for mechanisms of nucleotide-dependent conformational change. , 2008, Structure.

[22]  Fei Long,et al.  BALBES: a molecular-replacement pipeline , 2007, Acta crystallographica. Section D, Biological crystallography.

[23]  M. Nissen,et al.  Crystal Structures of NADH:FMN Oxidoreductase (EmoB) at Different Stages of Catalysis*♦ , 2008, Journal of Biological Chemistry.

[24]  Nick V Grishin,et al.  Discrimination between distant homologs and structural analogs: lessons from manually constructed, reliable data sets. , 2008, Journal of molecular biology.

[25]  P. Freemont,et al.  Analysis of Nucleotide Binding to P97 Reveals the Properties of a Tandem AAA Hexameric ATPase , 2008, Journal of Biological Chemistry.

[26]  Adam Godzik,et al.  The JCSG MR pipeline: optimized alignments, multiple models and parallel searches , 2007, Acta crystallographica. Section D, Biological crystallography.

[27]  E Yvonne Jones,et al.  The structural dynamics and energetics of an immunodominant T cell receptor are programmed by its Vbeta domain. , 2008, Immunity.

[28]  Martyn D. Winn,et al.  MrBUMP: an automated pipeline for molecular replacement , 2007, Acta crystallographica. Section D, Biological crystallography.

[29]  Randy J Read,et al.  Automated structure solution with the PHENIX suite. , 2008, Methods in molecular biology.

[30]  Nancy Wilkins-Diehr,et al.  TeraGrid Science Gateways and Their Impact on Science , 2008, Computer.

[31]  Barry W. Boehm,et al.  Making a Difference in the Software Century , 2008, Computer.

[32]  A. Sali,et al.  Evolutionary constraints on structural similarity in orthologs and paralogs , 2009, Protein science : a publication of the Protein Society.

[33]  Thomas F. Wenisch,et al.  PowerNap: eliminating server idle power , 2009, ASPLOS.

[34]  David Abramson,et al.  High-throughput protein structure determination using grid computing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[35]  A Roy Building and testing a production quality grid software distribution for the Open Science Grid , 2009 .

[36]  Alexei Vagin,et al.  Molecular replacement with MOLREP. , 2010, Acta crystallographica. Section D, Biological crystallography.

[37]  Michael Levitt,et al.  Super-resolution biomolecular crystallography with low-resolution data , 2010, Nature.