Identification of protein domains by shotgun proteolysis.

The identification of protein domains within multi-domain proteins is a persistent problem. Here, we describe an experimental method (shotgun proteolysis) based on random DNA fragmentation and protease selection of the encoded polypeptides on phage for this purpose. We applied the method to the Escherichia coli genome and identified 124 protease-resistant fragments; several were re-cloned for expression as soluble fragments in bacteria, and corresponded to autonomously folding units with folding energies similar to natural protein domains (DeltaG(u)=3.8-6.6 kcal/mol). Structural information was available for approximately half of the selected proteins, which corresponded to compact, globular and domain-sized units that had been derived from a wide range of protein superfamilies. Furthermore, boundaries of the selected fragments correlated with domain boundaries as defined by bioinformatics predictions (R2=0.82; p=0.016). However, predictions were incomplete or entirely lacking for the remaining fragments, reflecting the limited proteome coverage of current bioinformatics methods. Shotgun proteolysis therefore provides a means to identify domains and other autonomously folding units on a genome-wide scale, without any prior knowledge of sequence or structure. Shotgun proteolysis should be particularly valuable for structural studies of proteins and represents a high-throughput alternative to the classical limited proteolysis method for the isolation of stable components of multi-domain proteins.

[1]  A. Fersht,et al.  Assignment of the backbone 1H and 15N NMR resonances and secondary structure characterization of barstar , 1993, FEBS letters.

[2]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[3]  R. Hartley,et al.  Directed mutagenesis and barnase-barstar recognition. , 1993, Biochemistry.

[4]  M. Sternberg,et al.  Enhanced genome annotation using structural profiles in the program 3D-PSSM. , 2000, Journal of molecular biology.

[5]  F. Dean,et al.  Rapid amplification of plasmid and phage DNA using Phi 29 DNA polymerase and multiply-primed rolling circle amplification. , 2001, Genome research.

[6]  C. Chothia,et al.  Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. , 2001, Journal of molecular biology.

[7]  J. Sodroski,et al.  Probability Analysis of Variational Crystallization and Its Application to gp120, The Exterior Envelope Glycoprotein of Type 1 Human Immunodeficiency Virus (HIV-1)* , 1999, The Journal of Biological Chemistry.

[8]  C. Pace,et al.  Conformational stability of globular proteins. , 1990, Trends in biochemical sciences.

[9]  A. Fontana,et al.  Molecular Aspects of Proteolysis of Globular Proteins , 1993 .

[10]  M. Zamai,et al.  Correlation between sites of limited proteolysis and segmental mobility in thermolysin. , 1986, Biochemistry.

[11]  Stephen H. Bryant,et al.  Domain size distributions can predict domain boundaries , 2000, Bioinform..

[12]  C. Chothia,et al.  Evolution of the Protein Repertoire , 2003, Science.

[13]  V. De Filippis,et al.  Probing the conformational state of apomyoglobin by limited proteolysis. , 1997, Journal of molecular biology.

[14]  H R Hoogenboom,et al.  Multi-subunit proteins on the surface of filamentous phage: methodologies for displaying antibody (Fab) heavy and light chains. , 1991, Nucleic acids research.

[15]  C. Koth,et al.  Use of limited proteolysis to identify protein domains suitable for structural analysis. , 2003, Methods in enzymology.

[16]  Mark Gerstein,et al.  Strategies for structural proteomics of prokaryotes: Quantifying the advantages of studying orthologous proteins and of using both NMR and X‐ray crystallography approaches , 2003, Proteins.

[17]  P E Bourne,et al.  The Protein Data Bank. , 2002, Nucleic acids research.

[18]  B. Rost,et al.  Sequence-based prediction of protein domains. , 2004, Nucleic acids research.

[19]  A. Skerra Use of the tetracycline promoter for the tightly regulated production of a murine antibody fragment in Escherichia coli. , 1994, Gene.

[20]  S. Sidhu,et al.  Engineering M13 for phage display. , 2001, Biomolecular engineering.

[21]  B. Jap,et al.  Structural genomics of membrane proteins , 2004, Genome Biology.

[22]  John B. Anderson,et al.  MMDB: Entrez's 3D-structure database , 2002, Nucleic Acids Res..

[23]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[24]  Haruki Nakamura,et al.  Structural genomics of membrane proteins. , 2003, Accounts of chemical research.

[25]  L Serrano,et al.  Effect of active site residues in barnase on activity and stability. , 1992, Journal of molecular biology.

[26]  G. Winter,et al.  Proteolytic selection for protein folding using filamentous bacteriophages. , 1998, Folding & design.

[27]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[28]  Cheryl H. Arrowsmith,et al.  Protein production: feeding the crystallographers and NMR spectroscopists , 2000, Nature Structural Biology.

[29]  R. R. Robinson,et al.  Escherichia coli secretion of an active chimeric antibody fragment. , 1988, Science.

[30]  B. Bachmann,et al.  Pedigrees of some mutant strains of Escherichia coli K-12. , 1972, Bacteriological reviews.

[31]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[32]  Mark Gerstein,et al.  Structural proteomics of an archaeon , 2000, Nature Structural Biology.

[33]  Cyrus Chothia,et al.  The SUPERFAMILY database in 2004: additions and improvements , 2004, Nucleic Acids Res..

[34]  Ryan Day,et al.  A consensus view of fold space: Combining SCOP, CATH, and the Dali Domain Dictionary , 2003, Protein science : a publication of the Protein Society.

[35]  R. Porter The formation of a specific inhibitor by hydrolysis of rabbit antiovalbumin. , 1950, The Biochemical journal.

[36]  R. Porter The hydrolysis of rabbit y-globulin and antibodies with crystalline papain. , 1959, The Biochemical journal.