Identification of linker regions and domain borders of the transcription activator protein NtrC from Escherichia coli by limited proteolysis, in-gel digestion, and mass spectrometry.

We have developed a mass spectrometry based method for the identification of linker regions and domain borders in multidomain proteins. This approach combines limited proteolysis and in-gel proteolytic digestions and was applied to the determination of linkers in the transcription factor NtrC from Escherichia coli. Limited proteolysis of NtrC with thermolysin and papain revealed that initial digestion yielded two major bands in SDS-PAGE that were identified by mass spectrometry as the R-domain and the still covalently linked OC-domains. Subsequent steps in limited proteolysis afforded further cleavage of the OC-fragment into the O- and the C-domain at accessible amino acid residues. Mass spectrometric identification of the tryptic/thermolytic peptides obtained after in-gel total proteolysis of the SDS-PAGE-separated domains determined the domain borders and showed that the protease accessible linker between R- and O-domain comprised amino acids Val-131 and Gln-132 within the "Q-linker" in agreement with papain and subtilisin digestion. The region between amino acid residues Thr-389 and Gln-396 marked the hitherto unknown linker sequence that connects the O- with the C-domain. High abundances of proline-, alanine-, serine-, and glutamic acid residues were found in this linker structure (PASE-linker) of related NtrC response regulator proteins. While R- and C-domains remained stable under the applied limited proteolysis conditions, the O-domain was further truncated yielding a core fragment that comprised the sequence from Ile-140 to Arg-320. ATPase activity was lost after separation of the R-domain from the OC-fragment. However, binding of OC- and C- fragments to specific DNA was observed by characteristic band-shifts in migration retardation assays, indicating intact tertiary structures of the C-domain. The outlined strategy proved to be highly efficient and afforded lead information of tertiary structural features necessary for protein design and engineering and for structure-function studies.