Rat hepatic cytosolic phosphoenolpyruvate carboxykinase (GTP). Structures of the protein, messenger RNA, and gene.

The primary structure of the messenger RNA coding for cytosolic phosphoenolpyruvate carboxykinase was determined by sequencing cDNA and genomic DNA and by primer extension of the mRNA. The molecule is 2624 nucleotides in length; this includes 143 nontranslated nucleotides at the 5' end and 615 nontranslated nucleotides at the 3' end. The 3' nontranslated sequence contains a 102-base pair region of alternating purine-pyrimidine nucleotides (the majority of which are UpG dinucleotides), several direct repeats and palindromic sequences, and 8 CpG dinucleotides. The corresponding segment of the phosphoenolpyruvate carboxykinase gene thus has characteristics which favor the formation of Z-DNA. The amino acid sequence of phosphoenolpyruvate carboxykinase was deduced from the mRNA sequence and confirmed by fast atom bombardment mass spectrometric analysis of peptides generated with trypsin and Staphylococcus aureus V8 protease. The protein consists of 621 amino acids and has a molecular weight of 69,289. Charon 4A lambda bacteriophage clones containing genomic DNA coding for phosphoenolpyruvate carboxykinase were isolated from a library of partial HaeIII digests of rat liver DNA. Two clones, lambda PC112 and lambda PC103, contained the entire coding region in 15-kilobase inserts and were used to subclone the gene into pBR322 as EcoRI, BamHI, or SstI-KpnI fragments. Using these subclones, the structure of the phosphoenolpyruvate carboxykinase gene was determined by S1 nuclease mapping, R-loop analysis, and DNA sequencing. The gene is composed of 10 exons and 9 introns with a total length of 6.0 kilobases. The transcription initiation site of the gene was determined by a combination of in vitro transcription in a HeLa cell lysate system, primer extension of mRNAPEPCK, and S1 nuclease mapping. In vitro transcription of purified DNA templates revealed three RNA polymerase II-dependent start sites. Two sites were separated by 600 base pairs on the coding strand and the third site was on the noncoding strand. The products of S1 nuclease mapping and primer extension from a BglII site were compared in order to determine which of the coding strand initiation sites was expressed in vivo. In both cases a 69-base pair fragment was generated and the 5' end of this corresponded to a thymidine residue identified in a sequence ladder of the genomic DNA coding strand. We conclude that mRNAPEPCK synthesis initiates with an adenine residue 69 base pairs 5' of the BglII site; this corresponds to the 3' most transcription initiation site determined in vitro.