Discovery of the principal specific transcription factors of Apicomplexa and their implication for the evolution of the AP2-integrase DNA binding domains

The comparative genomics of apicomplexans, such as the malarial parasite Plasmodium, the cattle parasite Theileria and the emerging human parasite Cryptosporidium, have suggested an unexpected paucity of specific transcription factors (TFs) with DNA binding domains that are closely related to those found in the major families of TFs from other eukaryotes. This apparent lack of specific TFs is paradoxical, given that the apicomplexans show a complex developmental cycle in one or more hosts and a reproducible pattern of differential gene expression in course of this cycle. Using sensitive sequence profile searches, we show that the apicomplexans possess a lineage-specific expansion of a novel family of proteins with a version of the AP2 (Apetala2)-integrase DNA binding domain, which is present in numerous plant TFs. About 20–27 members of this apicomplexan AP2 (ApiAP2) family are encoded in different apicomplexan genomes, with each protein containing one to four copies of the AP2 DNA binding domain. Using gene expression data from Plasmodium falciparum, we show that guilds of ApiAP2 genes are expressed in different stages of intraerythrocytic development. By analogy to the plant AP2 proteins and based on the expression patterns, we predict that the ApiAP2 proteins are likely to function as previously unknown specific TFs in the apicomplexans and regulate the progression of their developmental cycle. In addition to the ApiAP2 family, we also identified two other novel families of AP2 DNA binding domains in bacteria and transposons. Using structure similarity searches, we also identified divergent versions of the AP2-integrase DNA binding domain fold in the DNA binding region of the PI-SceI homing endonuclease and the C-terminal domain of the pleckstrin homology (PH) domain-like modules of eukaryotes. Integrating these findings, we present a reconstruction of the evolutionary scenario of the AP2-integrase DNA binding domain fold, which suggests that it underwent multiple independent combinations with different types of mobile endonucleases or recombinases. It appears that the eukaryotic versions have emerged from versions of the domain associated with mobile elements, followed by independent lineage-specific expansions, which accompanied their recruitment to transcription regulation functions.

[1]  E. Nudler,et al.  RNA polymerase holoenzyme: structure, function and biological implications. , 2003, Current opinion in microbiology.

[2]  N. Guex,et al.  SWISS‐MODEL and the Swiss‐Pdb Viewer: An environment for comparative protein modeling , 1997, Electrophoresis.

[3]  K. Singh,et al.  Arabidopsis thaliana ethylene-responsive element binding protein (AtEBP), an ethylene-inducible, GCC box DNA-binding protein interacts with an ocs element binding protein. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[4]  William Stafford Noble,et al.  Matrix2png: a utility for visualizing matrix data , 2003, Bioinform..

[5]  L. Aravind,et al.  Plasmodium Biology Genomic Gleanings , 2003, Cell.

[6]  N. Grishin Treble clef finger--a functionally diverse zinc-binding structural motif. , 2001, Nucleic acids research.

[7]  Temple F. Smith,et al.  Comparison of the complete protein sets of worm and yeast: orthology and divergence. , 1998, Science.

[8]  David T. Jones,et al.  β Propellers: structural rigidity and functional diversity , 1999 .

[9]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[10]  Masashi Suzuki,et al.  A novel mode of DNA recognition by a β‐sheet revealed by the solution structure of the GCC‐box binding domain in complex with DNA , 1998, The EMBO journal.

[11]  Patricia De la Vega,et al.  Discovery of Gene Function by Expression Profiling of the Malaria Parasite Life Cycle , 2003, Science.

[12]  D. Landsman,et al.  AT-hook motifs identified in a wide variety of DNA-binding proteins. , 1998, Nucleic acids research.

[13]  P. Schjerling,et al.  Comparative amino acid sequence analysis of the C6 zinc cluster family of transcriptional regulators. , 1996, Nucleic acids research.

[14]  C. Gross,et al.  The functional and regulatory roles of sigma factors in transcription. , 1998, Cold Spring Harbor symposia on quantitative biology.

[15]  Chris Sander,et al.  The FSSP database: fold classification based on structure-structure alignment of proteins , 1996, Nucleic Acids Res..

[16]  Peer Bork,et al.  Systematic identification of novel protein domain families associated with nuclear functions. , 2002, Genome research.

[17]  A. Smit,et al.  Tiggers and DNA transposon fossils in the human genome. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[18]  L. Aravind The BED finger, a novel DNA-binding domain in chromatin-boundary-element-binding proteins and transposases. , 2000, Trends in biochemical sciences.

[19]  S Miyano,et al.  Open source clustering software. , 2004, Bioinformatics.

[20]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[21]  M. Hasegawa,et al.  Gene transfer to the nucleus and the evolution of chloroplasts , 1998, Nature.

[22]  AINTEGUMENTA utilizes a mode of DNA recognition distinct from that used by proteins containing a single AP2 domain. , 2003, Nucleic acids research.

[23]  J. Fassler,et al.  Promoters and basal transcription machinery in eubacteria and eukaryotes: concepts, definitions, and analogies. , 1996, Methods in enzymology.

[24]  S. Wessler Homing into the origin of the AP2 DNA binding domain. , 2005, Trends in plant science.

[25]  F. Quiocho,et al.  Crystal structure of the intein homing endonuclease PI-SceI bound to its recognition sequence , 2002, Nature Structural Biology.

[26]  M. Nilges,et al.  The PH superfold: a structural scaffold for multiple functions. , 1999, Trends in biochemical sciences.

[27]  R. Plasterk,et al.  Involvement of a Bifunctional, Paired-like DNA-binding Domain and a Transpositional Enhancer in Sleeping BeautyTransposition* , 2002, The Journal of Biological Chemistry.

[28]  B. Leander,et al.  Phylogeny of gregarines (Apicomplexa) as inferred from small-subunit rDNA and beta-tubulin. , 2003, International journal of systematic and evolutionary microbiology.

[29]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[30]  Sudhir Kumar,et al.  MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment , 2004, Briefings Bioinform..

[31]  J. Derisi,et al.  The Transcriptome of the Intraerythrocytic Developmental Cycle of Plasmodium falciparum , 2003, PLoS biology.

[32]  R. R. Samaha,et al.  Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. , 2000, Science.

[33]  E. Sitbon,et al.  New types of conserved sequence domains in DNA-binding regions of homing endonucleases. , 2003, Trends in biochemical sciences.

[34]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[35]  M. Ohme-Takagi,et al.  Ethylene-inducible DNA binding proteins that interact with an ethylene-responsive element. , 1995, The Plant cell.

[36]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[37]  R. Parker,et al.  Crystal structure of Dcp1p and its functional implications in mRNA decapping , 2004, Nature Structural &Molecular Biology.

[38]  Christine Nowak,et al.  Structure of a Ran-binding domain complexed with Ran bound to a GTP analogue: implications for nuclear transport , 1999, Nature.

[39]  Heiko Schoof,et al.  Conservation, diversification and expansion of C2H2 zinc finger proteins in the Arabidopsis thaliana genome , 2004, BMC Genomics.

[40]  G. McFadden,et al.  The apicoplast: a plastid in Plasmodium falciparum and other Apicomplexan parasites. , 2003, International review of cytology.

[41]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[42]  S. Bell,et al.  Transcription and translation in Archaea: a mosaic of eukaryal and bacterial features. , 1998, Trends in microbiology.

[43]  Sarah Hake,et al.  From Endonucleases to Transcription Factors: Evolution of the AP2 DNA Binding Domain in Plantsw⃞ , 2004, The Plant Cell Online.

[44]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[45]  P. Hajduk,et al.  Solution structure of a pleckstrin-homology domain , 1994, Nature.

[46]  H. Lodish Molecular Cell Biology , 1986 .

[47]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[48]  E. Koonin,et al.  The role of lineage-specific gene family expansion in the evolution of eukaryotes. , 2002, Genome research.

[49]  K. Karrer,et al.  Homing Endonucleases Encoded by Germ Line-Limited Genes in Tetrahymena thermophila Have APETELA2 DNA Binding Domains , 2004, Eukaryotic Cell.

[50]  C. Sander,et al.  Dali: a network tool for protein structure comparison. , 1995, Trends in biochemical sciences.

[51]  John C. Wootton,et al.  Non-globular Domains in Protein Sequences: Automated Segmentation Using Complexity Measures , 1994, Comput. Chem..

[52]  P. Cramer Common structural features of nucleic acid polymerases. , 2002, BioEssays : news and reviews in molecular, cellular and developmental biology.

[53]  D. T. Jones,et al.  Beta propellers: structural rigidity and functional diversity. , 1999, Current opinion in structural biology.

[54]  K Henrick,et al.  Electronic Reprint Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions , 2022 .

[55]  Geoffrey J. Barton,et al.  JPred : a consensus secondary structure prediction server , 1999 .

[56]  E. Koonin,et al.  Evolutionary connection between the catalytic subunits of DNA-dependent RNA polymerases and eukaryotic RNA-dependent RNA polymerases and the origin of RNA polymerases , 2003, BMC Structural Biology.

[57]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[58]  L. Aravind,et al.  The many faces of the helix-turn-helix domain: transcription regulation and beyond. , 2005, FEMS microbiology reviews.

[59]  J. Thompson,et al.  The PH domain: a common piece in the structural patchwork of signalling proteins. , 1993, Trends in biochemical sciences.

[60]  S. Bell,et al.  Temperature, template topology, and factor requirements of archaeal transcription. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[61]  V. Lamour,et al.  TFIIH contains a PH domain involved in DNA nucleotide excision repair , 2004, Nature Structural &Molecular Biology.

[62]  G J Barton,et al.  Application of multiple sequence alignment profiles to improve protein secondary structure prediction , 2000, Proteins.

[63]  L. Aravind,et al.  Comparative analysis of apicomplexa and genomic diversity in eukaryotes. , 2004, Genome research.

[64]  Kazuo Shinozaki,et al.  Solution Structure of the B3 DNA Binding Domain of the Arabidopsis Cold-Responsive Transcription Factor RAV1w⃞ , 2004, The Plant Cell Online.

[65]  C. Benning,et al.  WRINKLED1 encodes an AP2/EREB domain protein involved in the control of storage compound biosynthesis in Arabidopsis. , 2004, The Plant journal : for cell and molecular biology.

[66]  C. Ouzounis,et al.  Comparative genomics of transcriptional control in the human malaria parasite Plasmodium falciparum. , 2004, Genome research.

[67]  Satoru Miyano,et al.  Open source clustering software , 2004 .

[68]  E. Koonin,et al.  DNA-binding proteins and evolution of transcription regulation in the archaea. , 1999, Nucleic acids research.

[69]  L. Aravind,et al.  Origin of multicellular eukaryotes - insights from proteome comparisons. , 1999, Current opinion in genetics & development.

[70]  P. Thuriaux,et al.  Transcription in archaea: similarity to that in eucarya. , 1995, Proceedings of the National Academy of Sciences of the United States of America.