We have sequenced a region of cloned Xenopus laevis ribosomal DNA encompassing the last 24 nucleotides of the external transcribed spacer and the first 275 nucleotides of the 18S gene. The start of the 18S gene was identified by correlating the results obtained from RNA hybridization and fingerprinting with the DMA sequence. This 5' region of 18S rRNA contains five 2'-0-methyl groups and at least six pseudouridine residues. Several of these modified nucleotides are clustered into a relatively short region from nucleotides 99-124. Nucleotides 227-250 constitute a distinctive sequence of 24 consecutive G and C residues. Comparison with the first 160 nucleotides of a yeast 18S gene (25) reveals three blocks of high sequence homology separated by two short tracts where homology is low or absent. The external transcribed spacer sequences diverge widely from within a few nucleotides of the start of the 18S gene. INTRODUCTION Identification of the start of the 18S gene in ribosomal DNA (rDNA) is a necessary step in studying the structure and biosynthesis of vertebrate ribosomes. Boseley and co-workers (1,2) carried out extensive sequence analysis on the non-transcribed spacer and most of the external transcribed spacer of a cloned unit of X. laevis rDNA. They also sequenced a short, non-contiguous part of the 18S coding region, but the actual start of the 18S gene was not identified. In the present work we have combined the approaches of DNA sequence analysis and RNA hybridization and fingerprinting to identify the start of the 18S coding region and to characterize the 5' region of 18S rRNA. METHODS Which clone to sequence? X. laevis rDNA consists of several hundred tandemly linked repeating elements, each element consisting of a transcription unit and a non<) IRL Press Umited, 1 Falconberg Court, London W1V 5FG, U.K. 2871 Nucleic Acids Research transcribed spacer. It may become useful to obtain extensive sequence information from a single transcription unit, rather than to reconstruct a composite sequence from different transcription units. For this purpose it is necessary to use a cloned rDNA sequence that is bounded by restriction sites located outside the transcription unit. Hind III cuts once per rDNA repeat at the site of termination of transcription (3). Thus in Hind III clones each transcription unit remains just intact. R. Reeder (personal communication) has constructed several such clones using the vector pMB9. Cne of these clones, pXlrlOl, is in use in several laboratories (see, for example, ref.4). Therefore in this laboratory we have decided to approach sequencing objectives using pXlrlOl. To facilitate this work we have subcloned various regions of the parent clone. The subclone relevant to the present analysis is called pXlrlOlA. It contains the rDNA region from the last Bam HI site in the non-transcribed spacer to the Eco RI site in the 18S gene, ligated between the Bam HI and Eco RI sites of pBR322 (figure la). The general organization of this region of rDNA is shown in figure 3 of ref. 1. Further details relevant to sequencing are shown in figure 2, below. DNA sequence analysis This was carried out by the method of Maxam and Gilbert (5) with EcoRI EcoR (Bm H) Pst I \ , f ~~~~Xbal Xbol F pXlrlIOlA | pXlr14E5 7.4kb 6.0kb Bam HI Pst I (a) (b) Figure 1. Diagrams of plasmids pXlrlOlA and pXlrl4E5. The vector (single line) is pRR322(b) or the large Bam HI/Eco RI fragment of pBR322(a). The double line is the rDNA insert:open region, part of non-transcribed spacer; shaded region, external transcribed spacer; black region, part of 18S coding region. The arrow indicates the 5' to 3' direction of the "s" strand of rDNA, i.e. the strand whose sequence in the coding region is synonymous to RNA. Only those restriction sites that are relevant to construction and primary digestions of the plasmids are shown. Further restriction sites are shown in figure 2.