The PD-(D/E)XK superfamily revisited: identification of new members among proteins involved in DNA metabolism and functional predictions for domains of (hitherto) unknown function

BackgroundThe PD-(D/E)XK nuclease superfamily, initially identified in type II restriction endonucleases and later in many enzymes involved in DNA recombination and repair, is one of the most challenging targets for protein sequence analysis and structure prediction. Typically, the sequence similarity between these proteins is so low, that most of the relationships between known members of the PD-(D/E)XK superfamily were identified only after the corresponding structures were determined experimentally. Thus, it is tempting to speculate that among the uncharacterized protein families, there are potential nucleases that remain to be discovered, but their identification requires more sensitive tools than traditional PSI-BLAST searches.ResultsThe low degree of amino acid conservation hampers the possibility of identification of new members of the PD-(D/E)XK superfamily based solely on sequence comparisons to known members. Therefore, we used a recently developed method HHsearch for sensitive detection of remote similarities between protein families represented as profile Hidden Markov Models enhanced by secondary structure. We carried out a comparison of known families of PD-(D/E)XK nucleases to the database comprising the COG and PFAM profiles corresponding to both functionally characterized as well as uncharacterized protein families to detect significant similarities. The initial candidates for new nucleases were subsequently verified by sequence-structure threading, comparative modeling, and identification of potential active site residues.ConclusionIn this article, we report identification of the PD-(D/E)XK nuclease domain in numerous proteins implicated in interactions with DNA but with unknown structure and mechanism of action (such as putative recombinase RmuC, DNA competence factor CoiA, a DNA-binding protein SfsA, a large human protein predicted to be a DNA repair enzyme, predicted archaeal transcription regulators, and the head completion protein of phage T4) and in proteins for which no function was assigned to date (such as YhcG, various phage proteins, novel candidates for restriction enzymes). Our results contributes to the reduction of "white spaces" on the sequence-structure-function map of the protein universe and will help to jump-start the experimental characterization of new nucleases, of which many may be of importance for the complete understanding of mechanisms that govern the evolution and stability of the genome.

[1]  A. Murzin OB(oligonucleotide/oligosaccharide binding)‐fold: common structural and functional solution for non‐homologous sequences. , 1993, The EMBO journal.

[2]  Jonathan Casper,et al.  Combining local‐structure, fold‐recognition, and new fold methods for protein structure prediction , 2003, Proteins.

[3]  M. Kawamukai,et al.  Effects of the Escherichia coli sfsA Gene on mal Genes Expression and a DNA Binding Activity of SfsA , 2001, Bioscience, biotechnology, and biochemistry.

[4]  C. Lukacs,et al.  Understanding the immutability of restriction enzymes: crystal structure of BglII and its DNA substrate at 1.5 Å resolution , 2000, Nature Structural Biology.

[5]  Kay Hofmann,et al.  Tmbase-A database of membrane spanning protein segments , 1993 .

[6]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[7]  A. Aggarwal,et al.  Structure and function of restriction endonucleases. , 1995, Current opinion in structural biology.

[8]  V. Metelev,et al.  PspGI, a type II restriction endonuclease from the extreme thermophile Pyrococcus sp.: structural and functional studies to investigate an evolutionary relationship with several mesophilic restriction enzymes. , 2003, Journal of molecular biology.

[9]  Janusz M. Bujnicki,et al.  GeneSilico protein structure prediction meta-server , 2003, Nucleic Acids Res..

[10]  F. Avilés,et al.  BMC Structural Biology BioMed Central , 2005 .

[11]  A. Aggarwal,et al.  Structure of restriction endonuclease BamHI and its relationship to EcoRI , 1994, Nature.

[12]  M. Suyama,et al.  Prediction of the coding sequences of unidentified human genes. XIII. The complete sequences of 100 new cDNA clones from brain which code for large proteins in vitro. , 1999, DNA research : an international journal for rapid publication of reports on genes and genomes.

[13]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[14]  D. Baker,et al.  Coupled prediction of protein secondary and tertiary structure , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[15]  J. Bujnicki A model of structure and action of Sau3AI restriction endonuclease that comprises two MutH-like endonuclease domains within a single polypeptide. , 2001, Acta microbiologica Polonica.

[16]  J. Bujnicki Molecular Phylogenetics of Restriction Endonucleases , 2004 .

[17]  Č. Venclovas,et al.  Five‐stranded β‐sheet sandwiched with two α‐helices: A structural link between restriction endonucleases EcoRI and EcoRV , 1994 .

[18]  A. Pommer,et al.  Structural aspects of the inhibition of DNase and rRNase colicins by their immunity proteins. , 2002, Biochimie.

[19]  Janusz M Bujnicki,et al.  Crystallographic and bioinformatic studies on restriction endonucleases: inference of evolutionary relationships in the "midnight zone" of homology. , 2003, Current protein & peptide science.

[20]  Burkhard Rost,et al.  The PredictProtein server , 2003, Nucleic Acids Res..

[21]  Č. Venclovas,et al.  Five-stranded beta-sheet sandwiched with two alpha-helices: a structural link between restriction endonucleases EcoRI and EcoRV. , 1994, Proteins.

[22]  D. Morrison,et al.  Isolation and Characterization of Three Streptococcus pneumoniae Transformation-Specific Loci by Use of alacZ Reporter Insertion Vector , 1998, Journal of bacteriology.

[23]  M Ouali,et al.  Cascaded multiple classifiers for secondary structure prediction , 2000, Protein science : a publication of the Protein Society.

[24]  J. Bujnicki,et al.  Evolutionary Relationship between Different Subgroups of Restriction Endonucleases* , 2002, The Journal of Biological Chemistry.

[25]  Qing Zhang,et al.  The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema , 2004, Nucleic Acids Res..

[26]  J. Bujnicki,et al.  The Herpesvirus Alkaline Exonuclease Belongs to the Restriction Endonuclease PD-(D/E)XK Superfamily: Insight from Molecular Modeling and Phylogenetic Analysis , 2001, Virus Genes.

[27]  M. G. Rossmann,et al.  Structure and morphogenesis of bacteriophage T4 , 2003, Cellular and Molecular Life Sciences CMLS.

[28]  V. Šikšnys,et al.  Alternative arrangements of catalytic residues at the active sites of restriction enzymes , 2002, FEBS letters.

[29]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[30]  J. Bujnicki,et al.  Grouping together highly diverged PD-(D/E)XK nucleases and identification of novel superfamily members using structure-guided alignment of sequence profiles. , 2001, Journal of molecular microbiology and biotechnology.

[31]  K. Morikawa,et al.  Recognition of a TG Mismatch The Crystal Structure of Very Short Patch Repair Endonuclease in Complex with a DNA Duplex , 1999, Cell.

[32]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[33]  N. Kunishima,et al.  Crystallographic and functional studies of very short patch repair endonuclease. , 1999, Molecular cell.

[34]  R. Huber,et al.  Structure-based redesign of the catalytic/metal binding site of Cfr10I restriction endonuclease reveals importance of spatial rather than sequence conservation of active centre residues. , 1998, Journal of molecular biology.

[35]  J. Bujnicki,et al.  Specificity Changes in the Evolution of Type II Restriction Endonucleases , 2005, Journal of Biological Chemistry.

[36]  M. Sternberg,et al.  Enhanced genome annotation using structural profiles in the program 3D-PSSM. , 2000, Journal of molecular biology.

[37]  John B. Anderson,et al.  CDD: a curated Entrez database of conserved domain alignments , 2003, Nucleic Acids Res..

[38]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[39]  D. Lilley,et al.  Crystal structure of the Holliday junction resolving enzyme T7 endonuclease I , 2001, Nature Structural Biology.

[40]  C. Ban,et al.  Structural basis for MutH activation in E.coli mismatch repair and relationship of MutH to restriction endonucleases , 1998, The EMBO journal.

[41]  Aleksey A. Porollo,et al.  Accurate prediction of solvent accessibility using neural networks–based regression , 2004, Proteins.

[42]  P Argos,et al.  TMAP: a new email and WWW service for membrane-protein structural predictions. , 1995, Trends in biochemical sciences.

[43]  M. F. White,et al.  Substrate recognition and catalysis by the Holliday junction resolving enzyme Hje. , 2004, Nucleic acids research.

[44]  M. D. Topal,et al.  Structure of NaeI–DNA complex reveals dual-mode DNA recognition and complete dimer rearrangement , 2001, Nature Structural Biology.

[45]  Marcin Feder,et al.  A “FRankenstein's monster” approach to comparative modeling: Merging the finest fragments of Fold‐Recognition models and iterative model refinement aided by 3D structure evaluation , 2003, Proteins.

[46]  G J Barton,et al.  Application of multiple sequence alignment profiles to improve protein secondary structure prediction , 2000, Proteins.

[47]  K. Komori,et al.  Crystal structure of the archaeal holliday junction resolvase Hjc and implications for DNA recognition. , 2001, Structure.

[48]  J. Bujnicki,et al.  Identification of a PD-(D/E)XK-like domain with a novel configuration of the endonuclease active site in the methyl-directed restriction enzyme Mrr and its homologs. , 2001, Gene.

[49]  F. Dyda,et al.  Unexpected structural diversity in DNA recombination: the restriction endonuclease connection. , 2000, Molecular cell.

[50]  Cathy H. Wu,et al.  InterPro, progress and status in 2005 , 2004, Nucleic Acids Res..

[51]  E V Koonin,et al.  SURVEY AND SUMMARY: holliday junction resolvases and related nucleases: identification of new families, phyletic distribution and evolutionary trajectories. , 2000, Nucleic acids research.

[52]  W N Hunter,et al.  Structure of Hjc, a Holliday junction resolvase, from Sulfolobus solfataricus , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[53]  M. Slupska,et al.  Genes involved in the determination of the rate of inversions at short inverted repeats , 2000, Genes to cells : devoted to molecular & cellular mechanisms.

[54]  J Lundström,et al.  Pcons: A neural‐network–based consensus predictor that improves fold recognition , 2001, Protein science : a publication of the Protein Society.

[55]  G. Moore,et al.  Killing of E coli cells by E group nuclease colicins. , 2002, Biochimie.

[56]  Hongyi Zhou,et al.  Single‐body residue‐level knowledge‐based energy score combined with sequence‐profile and secondary structure information for fold recognition , 2004, Proteins.

[57]  L. D. Kosturko,et al.  Endonuclease II of coliphage T4: a recombinase disguised as a restriction endonuclease? , 1998, Molecular microbiology.

[58]  A. Pingoud,et al.  A mutational analysis of the PD...D/EXK motif suggests that McrC harbors the catalytic center for DNA cleavage by the GTP-dependent restriction enzyme McrBC from Escherichia coli. , 2002, Biochemistry.

[59]  A. Pingoud,et al.  Sau3AI, a Monomeric Type II Restriction Endonuclease That Dimerizes on the DNA and Thereby Induces DNA Loops* , 2001, The Journal of Biological Chemistry.

[60]  Dale B. Wigley,et al.  Crystal structure of RecBCD enzyme reveals a machine for processing DNA breaks , 2004, Nature.

[61]  K. Komori,et al.  X-ray and biochemical anatomy of an archaeal XPF/Rad1/Mus81 family nuclease: similarity between its endonuclease domain and restriction enzymes. , 2003, Structure.

[62]  Peer Bork,et al.  SMART 4.0: towards genomic data integration , 2004, Nucleic Acids Res..

[63]  D Fischer,et al.  Hybrid fold recognition: combining sequence derived properties with evolutionary information. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[64]  Richard J. Roberts,et al.  REBASE—restriction enzymes and DNA methyltransferases , 2004, Nucleic Acids Res..

[65]  Richard J. Roberts,et al.  REBASE: restriction enzymes and methyltransferases , 2003, Nucleic Acids Res..

[66]  D. Dryden,et al.  On the structure and operation of type I DNA restriction enzymes. , 1999, Journal of molecular biology.

[67]  J. Bujnicki,et al.  Understanding the evolution of restriction-modification systems: clues from sequence and structure comparisons. , 2001, Acta biochimica Polonica.

[68]  R. Brennan The winged-helix DNA-binding motif: Another helix-turn-helix takeoff , 1993, Cell.

[69]  Marcin Feder,et al.  Identification of a new family of putative PD-(D/E)XK nucleases with unusual phylogenomic distribution and a new type of the active site , 2005, BMC Genomics.

[70]  B. Matthews,et al.  Structural, functional, and evolutionary relationships between lambda-exonuclease and the type II restriction endonucleases. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[71]  J. Bujnicki,et al.  A theoretical model of restriction endonuclease NlaIV in complex with DNA, predicted by fold recognition and validated by site-directed mutagenesis and circular dichroism spectroscopy. , 2005, Protein engineering, design & selection : PEDS.

[72]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[73]  Janusz M. Bujnicki,et al.  Inference of relationships in the ‘twilight zone’ of homology using a combination of bioinformatics and site-directed mutagenesis: a case study of restriction endonucleases Bsp6I and PvuII , 2005, Nucleic acids research.