Sequence analysis of an exceptionally conserved operon suggests enzymes for a new link between histidine and purine biosynthesis

Recently a highly conserved protein family has been described, with representatives in all three major domains of life, Bacteria, Archaea, and Eukaryotes (Sivasubramaniam et al., 1995, Plant Mol Biol 29: 173–178; Braun et al., 1996, J Bacteriol 178: 6865–6872; Das et al., 1997, Nature 385: 29–30). The overall sequence similarity among the proteins in this family (hereafter referred to as the Snz family, Braun et al., 1996, ibid.) is comparable to that in the most-conserved known protein families, such as DnaK-related molecular chaperones, translation factors EF-Tu and EF-G, enolases, and glyceraldehyde-3-phosphate dehydrogenases. This high level of sequence conservation indicates that the Snz family of proteins has an important physiological role(s). However, no sequence similarity to proteins with known activity or conserved sequence motifs has been reported for the Snz proteins, and, so far, there has been no clue as to their possible function, except for the observation that some of them are stress induced (Mitchell et al., 1992, Mol Microbiol 6: 1579–1581; Sivasubramaniam et al., 1995, ibid.; Braun et al., 1996, ibid.). Here we present the results of a detailed sequence analysis of the Snz proteins, showing that they contain a phosphate-binding motif typical of the b /a-barrel protein superfamily and may catalyse one of the reactions linking histidine and purine biosynthesis. Search of the GenBank using the WUBLASTP program based on the BLAST2 algorithm (Altschul and Gish, 1996, Meth Enzymol 266: 460–481) revealed statistically significant similarity (P < 0.001) between Snz protein from Methanococcus jannaschii and bacterial ThiG proteins, which are involved in thiazole synthesis (Vander Horn et al., 1993, J Bacteriol 175: 982–992), and a lower, but still significant, similarity (P = 0.0034) with tryptophan synthase (TrpA) from Methanobacterium thermoautotrophicum. Additionally, other Snz proteins showed some similarity (P < 0.3) to indole-3-glycerol phosphate synthase (TrpC), phosphoribosylformimino-5-aminoimidazole carboxamide ribotide isomerase (HisA), and imidazoleglycerol phosphate synthase/cyclase (HisF), which are enzymes involved in tryptophan and histidine biosynthesis. All of these enzymes have similar substrates, containing a glycerophosphate moiety, and were recently shown to contain conserved phosphate-binding sites that were shared with a large variety of proteins comprising the b /a-barrel superfamily (Bork et al., 1995, Protein Sci 4: 268–274). Multiple sequence alignment of Snz proteins with ThiG, HisA, HisF, TrpA, and TrpC, generated using the MACAW program (Schuler et al., 1991, Proteins Struct Funct Genet 9: 180– 190), shows maximal conservation in the region containing the phosphate-binding motif, and the consensus features of this motif, as described by Bork et al. (1995, ibid.), are present in the Snz proteins (Fig. 1). An additional database search for segments similar to the putative phosphate-binding motif conserved in Snz, ThiG, TrpC, HisA, and TrpA proteins using the MOST program (Tatusov et al., 1994, Proc Natl Acad Sci USA 91: 12091–12095) converged at the b /a-barrel superfamily, when a restrictive cut-off of r < 0.001 was applied. Furthermore, even when the alignment block including only the Snz and ThiG sequences was used in a MOST search, 214 sequences, all of which belong to this superfamily, were recovered. Secondary-structure prediction for the Snz proteins using the PHD program (Rost, 1996, Meth Enzymol 266: 525–539) indicated that the structure in the region of the putative phosphate-binding site consists of a short b-strand, followed by a loop and a helix, as indicated in Fig. 1, and resembles that in b/a-barrel proteins with experimentally determined 3-D structure, namely glycolate oxidase (GOX_SPIOL), tryptophan synthase (TRPA_SALTY), and inosine 58-monophosphate dehydrogenase (IMD2_MESAU). This region, and particularly the helix, is exposed to the surface of the protein molecule. The overall secondary-structure prediction (Rost, 1996, ibid.) for the Snz proteins is compatible with the pattern typical of the b /abarrel superfamily (data not shown). Based on these observations, we conclude that the Snz proteins contain a phosphate-binding site typical of the b /a-barrel superfamily, they probably have the overall 3-D structure similar to that of b/a-barrel proteins, and they may possess an enzymatic activity involving phosphorylated polyols, perhaps glycerophosphate derivatives or sugar phosphates. This is consistent with the observation that YAAD_BACSU binds GTP (Mitchell et al., 1992, ibid.). In Bacillus subtilis and Haemophilus influenzae, genes encoding Snz proteins (hereafter referred to as SnzA) are immediately followed by, and apparently form operons with, genes coding for smaller proteins (SnzB) which also share a high degree of similarity (Fig. 2). Remarkably, each of the three yeast snzA genes is also accompanied by an snzB gene, but they are located on the opposing DNA strand and are separated from the snzA genes by 394, 449, and 391 bp, respectively. In the M. jannaschii Molecular Microbiology (1997) 24(2), 443–445