Regulatory sequences involved in the translation of Neurospora crassa mRNA: Kozak sequences and stop codons

We have analyzed the sequences of 77 nuclear genes of N. crassa thought to be transcribed by RNA polymerase II (References 1-72) which should represent virtually all of the presently published nuclear gene sequences for this fungus. Creative Commons License This work is licensed under a Creative Commons Attribution-Share Alike 4.0 License. This regular paper is available in Fungal Genetics Reports: http://newprairiepress.org/fgr/vol40/iss1/3 Regulatory sequences involved in the translation of Neurospora crassa mRNA: Kozak sequences and stop codons Jon J.P. Bruchez, J. Eberle and V.E.A. Russo Max Planck Institut für Molekulare Genetik, Ihnestraße 73, D-14195 Berlin, Germany We have analyzed the sequences of 77 nuclear genes of N. crassa thought to be transcribed by RNA polymerase II (References 1-72) which should represent virtually all of the presently published nuclear gene sequences for this fungus. Kozak (1988, Nucl. Acids Res. 15:8125-) analyzed 699 vertebrate genes leading to identification of the vertebrate consensus sequence for initiation of translation, or Kozak Sequence: G44C39C53(A61/G36)(C49/A27)C55A100T100G100G46 We show here that the N. crassa Kozak sequence is: C57NNNC77A81(A44/C43)"T"3A99T100G99G51C53 where the subscript number indicates the % occurrence of the particular nucleotide and "T" indicates the conserved absence of that particular nucleotide. We arbitrarily decided that a nucleotide was to be included in the consensus only if it was present in at least 50% of all the sequences analyzed. If two nucleotides, each represented at less than 50%, gave a summed total of at least 75% representation for a single position, then both are shown in brackets. Table I. Consensus for initiation of translation and stop codons in Neurospora crassa No. Ref. Gene Distance from +1 Kozak Sequence Stop codon to ATG (bases) Consensus:CNNNCAATGGC 1 1 acp 46 AATATCACAATGGCG TAA 2 2 acu-3 CTGCCCATCATGGCT TAG 3 3 acu-5 103 ATACGAGTTATGGCG TAA 4 4 acu-8 TCACCAACCATGGCG TAA 5 5 acu-9 60 CTTTTCACAATGGCT TAA 6 6 al-1 ACAGACAAAATGGCT TAG 7 7 al-3 90 CACGTCACCATGGCC TGA 8 8 alc 54 TCCCTCACCATGACC TAA 9 9 am 109 ACCTTCAAAATGTCT TAA 10 10 arg-2 118 CAAGTCAAGATGTTC TAA 11 11 atp-1 90 CTCCACAACATGTTC TAA 12 11 atp-2 58 ATCGTCAAGATGTTC TAA 13 12 bli-7 110 ACCGCCAAAATGCAG TAA 14 13 Bml ACCGTCAAGATGCGT TAA 15 14 chs-1 69 TCCGCAACCATGGCG TGA Published by New Prairie Press, 2017 16 15 cmt 127 TCTATCAAAATGGGT TAA 17 16 con-8 221 ACAATAACCATGGAT TGA 18 17 con-10 91 ATCGTCAACATGGCT TAG 19 18 con-13 86 CGTCGCAAGATGCCC TGA 20 19 cot-1 GGTACCAAGATGGAC TAA 21 20 cpc-1 622 TCCATCAAGATGCGT TAA 22 21 cpi TTAGTGAAAATGTTT TAA 23 22 crp-1 GCAGACAACATGGTA TAA 24 23 crp-2 62 ACCGTCAAGATGCCC TGA 25 24 crp-3 58 GCCGGCAAAATGGGT TAA 26 25 cya-4 146 GCCGCCACCATGCTT TAA 27 26 cys-3 30 CATGGCACAATGTCT TAA 28 27 cys-14 32 GACACTCAGATGGCT TAA 29 28 cyt-2 TCAGTCGCAATGGGT TAA 30 29 cyt-18 TCACATCAAATGCTG TAA 31 30 cyt-20 57 GTCCTCTGGATGCCG TAA 32 31 cyt-21 125 CGGTCCAACATGGTT TGA 33 32 for 66 TCAGTCACCATGTCT TAA 34 33 frq GAAACCTGAGTTGGA TGA 35 34 grg-1 89 TCAACCAAAATGGAT TAA 36 35 H3 ACCATCACAATGGCC TAA 37 35 H4 CATATCAAAATGACT TAA 38 36 his-3 124 GAAAACACCATGGAG TAA 39 37 hsp30 120 AAGTCAAAAATGGCG TAA 40 38 ilv-2 TCCATCACAATGGCC TAA 41 39 laccase 190 TTTATCACCATGAAA TAG 42 40 leu-5 146 CACAACGCGATGCCT TAG 43 41 leu-6 220 TAAACAAACATGGCC TAA 44 42 lox 123 TCATACAAGATGAAG TGA 45 43 met-7 98 ATCACAGCCATGCTT TGA 46 44 mrp-3 CCTCTCACCATGATC TAA 47 45 mta-1 ACCGAAACAATGGAC TGA 48 46 mtA-1 AGAAACACGATGTCG TAG 49 47 nac 162 CCGGTGACAATGACG TAA 50 48 ncypt1 TTGCCCATCATGAAC TAA 51 49 nit-2 284 TGTGCGACAATGGCG TAA 52 50 nit-3 110 AGCATCATCATGGAG TGA 53 51 nit-4 39 CCCCGGCAGATGAAC TGA 54 52 nuc-1 GCGGGCGTGATGAAC TAA 55 53 nur22 ACCGTCAAGATGGCG TGA 56 54 nur40 ACTCACAAGATGGCT TGA 57 55 nur49 CAAACAACAATGGCG TAA 58 56 pho-4 145 TCGTTCAAGATGGTT TGA 59 57+58 pma-1 56 ATAACGCCAATGGCG TAA 60 59 preg GGATTTGTGATGCTG TAA 61 60 pyr-4 61 ACAGCCAACATGTCG TAG 62 61 qa-1F 330 AATCCCAACATGCCG TAG 63 61+62 qa-1S 346 GCCGCCATCATGAAC TGA 64 61 qa-2 85 CCAAACACAATGGCG TGA 65 61 qa-3 83 TATATCACCATGTCG TGA 66 61+63 qa-4 190 CCTTTCGCCATGCCG TAA 67 61 qa-x 84 TCAGCAGCCATGACA TGA 68 61 qa-y 133 CGCGTCAAGATGACT TAA 69 64 sod-1 TCCGTCAAAATGGTC TAA 70 65 spe-1 535 TCTTGGGATATGGTT TAA 71 66 T 94 GCAGCAACCATGAGC TGA 72 67 trp-1 29 CCAATCACAATGTCG TAA http://newprairiepress.org/fgr/vol40/iss1/3 DOI: 10.4148/1941-4765.1394 73 68 trp-3 147 TCATACACAATGGAG TAA 74 69 Ubi ACCCCCATCATGCAG TAA 75 70 ucr ACCGACACAATGGCG TAA 76 71 vma-1 TCGCCCAAGATGGCT TGA 77 72 vma-2 TCTTCCACAATGGCC TAA Key: in the Distance from +1 to ATG (bases) means that the authors had not determined the +1 position The reason why the methionine start codon (ATG) is not 100% perfectly conserved within the Kozak consensus is that, for reasons unknown, the gene frq (Ref 33) starts its protein sequence with a valine (GTT). It is also interesting to note that the choice of the second codon appears to be limited in that about half of the second codons have a guanosine in the first position and another half have a cytosine in the second position. On the whole, our consensus shows a good resemblance to the mammalian Kozak sequence with a similar hierarchy of nucleotide preference for a given position, although the degree of preference may be shifted. An exception is the nucleotide position immediately preceding the initiator methionine codon (ATG) where N. crassa exhibits a definite suppression of thymine in contrast to a positive preference for any other nucleotide. Fifty genes among the 77 analyzed have a determined mRNA 5' end. When several 5' ends were presented, +1 was taken to be the most distal from the ATG except when given by the authors themselves. In this way the mRNA sequences before the ATG have lengths between 30 and 622 bases. The stop codon, determined by computer analysis by the authors, TAA in 62% of the cases, TGA in 27% and TAG in 11%