New Knowledge from Old: In silico discovery of novel protein domains in Streptomyces coelicolor

BackgroundStreptomyces coelicolor has long been considered a remarkable bacterium with a complex life-cycle, ubiquitous environmental distribution, linear chromosomes and plasmids, and a huge range of pharmaceutically useful secondary metabolites. Completion of the genome sequence demonstrated that this diversity carried through to the genetic level, with over 7000 genes identified. We sought to expand our understanding of this organism at the molecular level through identification and annotation of novel protein domains. Protein domains are the evolutionary conserved units from which proteins are formed.ResultsTwo automated methods were employed to rapidly generate an optimised set of targets, which were subsequently analysed manually. A final set of 37 domains or structural repeats, represented 204 times in the genome, was developed. Using these families enabled us to correlate items of information from many different resources. Several immediately enhance our understanding both of S. coelicolor and also general bacterial molecular mechanisms, including cell wall biosynthesis regulation and streptomycete telomere maintenance.DiscussionDelineation of protein domain families enables detailed analysis of protein function, as well as identification of likely regions or residues of particular interest. Hence this kind of prior approach can increase the rate of discovery in the laboratory. Furthermore we demonstrate that using this type of in silico method it is possible to fairly rapidly generate new biological information from previously uncorrelated data.

[1]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[2]  N. Grishin,et al.  C‐terminal domain of gyrase A is predicted to have a β‐propeller structure , 2002 .

[3]  Tomoji Maeda,et al.  Expression pattern, subcellular localization and structure–function relationship of rat Tpx‐1, a spermatogenic cell adhesion molecule responsible for association with Sertoli cells , 1999, Development, growth & differentiation.

[4]  A. Rawls,et al.  Allurin, a 21-kDa sperm chemoattractant from Xenopus egg jelly, is related to mammalian sperm-binding proteins , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  M. Hayashi,et al.  FMN is covalently attached to a threonine residue in the NqrB and NqrC subunits of Na+‐translocating NADH‐quinone reductase from Vibrio alginolyticus , 2001, FEBS letters.

[6]  S. Horinouchi,et al.  Protein serine/threonine kinases in signal transduction for secondary metabolism and morphogenesis in Streptomyces , 2002, Applied Microbiology and Biotechnology.

[7]  R. Fleischmann,et al.  Complete Genome Sequence of the Methanogenic Archaeon, Methanococcus jannaschii , 1996, Science.

[8]  A. Murzin Structural principles for the propeller assembly of β‐sheets: The preference for seven‐fold symmetry , 1992, Proteins.

[9]  Amos Bairoch,et al.  The PROSITE database, its status in 2002 , 2002, Nucleic Acids Res..

[10]  Neil D. Rawlings,et al.  MEROPS: the protease database , 2002, Nucleic Acids Res..

[11]  Peer Bork,et al.  Recent improvements to the SMART domain-based sequence annotation resource , 2002, Nucleic Acids Res..

[12]  L. Heide,et al.  Cloning and analysis of the simocyclinone biosynthetic gene cluster of Streptomyces antibioticus Tü 6040 , 2002, Archives of Microbiology.

[13]  N. Koshikawa,et al.  cDNA cloning of a novel trypsin inhibitor with similarity to pathogenesis-related proteins, and its frequent expression in human brain cancer cells. , 1998, Biochimica et biophysica acta.

[14]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[15]  C. Huang,et al.  The homologous terminal sequence of the Streptomyces lividans chromosome and SLP2 plasmid. , 2000, Microbiology.

[16]  L. Heide,et al.  Identification of the Coumermycin A1Biosynthetic Gene Cluster of Streptomyces rishiriensisDSM 40489 , 2000, Antimicrobial Agents and Chemotherapy.

[17]  B. Barquera,et al.  Expression and mutagenesis of the NqrC subunit of the NQR respiratory Na+ pump from Vibrio cholerae with covalently attached FMN , 2001, FEBS letters.

[18]  P Bork,et al.  Novel protein domains and repeats in Drosophila melanogaster: insights into structure, function, and evolution. , 2001, Genome research.

[19]  E V Koonin,et al.  Protein fold recognition using sequence profiles and its application in structural genomics. , 2000, Advances in protein chemistry.

[20]  V. Fried,et al.  A hookworm glycoprotein that inhibits neutrophil function is a ligand of the integrin CD11b/CD18. , 1994, The Journal of biological chemistry.

[21]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[22]  Kim Rutherford,et al.  Artemis: sequence visualization and annotation , 2000, Bioinform..

[23]  A. Lupas,et al.  Predicting coiled coils from protein sequences , 1991, Science.

[24]  Michael Y. Galperin,et al.  MHYT, a new integral membrane sensor domain. , 2001, FEMS microbiology letters.

[25]  B. Friedrich,et al.  A novel NO‐responding regulator controls the reduction of nitric oxide in Ralstonia eutropha , 2000, Molecular microbiology.

[26]  W. Zumft,et al.  Nitric Oxide Signaling and Transcriptional Control of Denitrification Genes in Pseudomonas stutzeri , 2001, Journal of bacteriology.

[27]  E. Brunskill,et al.  The Staphylococcus aureus scdA gene: a novel locus that affects cell division and morphogenesis. , 1997, Microbiology.

[28]  B. Barrell,et al.  Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2) , 2002, Nature.

[29]  Alex Bateman,et al.  The PASTA domain: a beta-lactam-binding domain. , 2002, Trends in biochemical sciences.

[30]  Robert D. Finn,et al.  The PASTA domain: a β-lactam-binding domain , 2002 .

[31]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[32]  W. Schwarz,et al.  Molecular characterization of co-transcribed genes from Streptomyces tendae Tü901 involved in the biosynthesis of the peptidyl moiety and assembly of the peptidyl nucleoside antibiotic nikkomycin , 1999, Molecular and General Genetics MGG.

[33]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[34]  Alex Bateman,et al.  The InterPro database, an integrated documentation resource for protein families, domains and functional sites , 2001, Nucleic Acids Res..

[35]  D. W. Hamilton,et al.  A Comparative Analysis of Expression and Processing of the Rat Epididymal Fluid and Sperm-Bound Forms of Proteins D and E1 , 2002, Biology of reproduction.

[36]  E. Koonin,et al.  The domains of death: evolution of the apoptosis machinery. , 1999, Trends in biochemical sciences.

[37]  B. Rost PHD: predicting one-dimensional protein structure by profile-based neural networks. , 1996, Methods in enzymology.

[38]  W. Pearson Rapid and sensitive sequence comparison with FASTP and FASTA. , 1990, Methods in enzymology.

[39]  L. Aravind,et al.  Comparative Genome Analysis of the Pathogenic Spirochetes Borrelia burgdorferi and Treponema pallidum , 2000, Infection and Immunity.

[40]  J. Willison,et al.  Overexpression in Escherichia coli of the rnf genes from Rhodobacter capsulatus--characterization of two membrane-bound iron-sulfur proteins. , 1998, European journal of biochemistry.

[41]  S. Brunak,et al.  SHORT COMMUNICATION Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites , 1997 .

[42]  N. Grishin,et al.  C-terminal domain of gyrase A is predicted to have a beta-propeller structure. , 2002, Proteins.

[43]  C. Walsh,et al.  Genetics and Assembly Line Enzymology of Siderophore Biosynthesis in Bacteria , 2002, Microbiology and Molecular Biology Reviews.

[44]  Michael Y. Galperin,et al.  Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement, and operon disruption , 1998, Silico Biol..

[45]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[46]  John C. Wootton,et al.  Statistics of Local Complexity in Amino Acid Sequences and Sequence Databases , 1993, Comput. Chem..

[47]  R. L. Brown,et al.  Pseudechetoxin: a peptide blocker of cyclic nucleotide-gated ion channels. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[48]  C. Ponting,et al.  The natural history of protein domains. , 2002, Annual review of biophysics and biomolecular structure.

[49]  C. Hutchinson,et al.  Mapping the DNA‐binding domain and target sequences of the Streptomyces peucetius daunorubicin biosynthesis regulatory protein, DnrI , 2002, Molecular microbiology.

[50]  A Bateman,et al.  Searching databases to find protein domain organization. , 2000, Advances in protein chemistry.

[51]  K. Wüthrich,et al.  Structure comparison of human glioma pathogenesis-related protein GliPR and the plant pathogenesis-related protein P14a indicates a functional link between the human immune system and a plant defense system. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[52]  S. Salzberg,et al.  Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39. , 2000, Nucleic acids research.

[53]  M. Bibb,et al.  afsR is a pleiotropic but conditionally required regulatory gene for antibiotic production in Streptomyces coelicolor A3(2) , 1996, Molecular microbiology.

[54]  E. Sonnhammer,et al.  Modular arrangement of proteins as inferred from analysis of homology , 1994, Protein science : a publication of the Protein Society.

[55]  George Vasmatzis,et al.  Identification of differentially expressed genes in normal and malignant prostate by electronic profiling of expressed sequence tags. , 2002, Cancer research.