ApicoTFdb: the comprehensive web repository of apicomplexan transcription factors and transcription-associated co-factors

Abstract Despite significant progress in apicomplexan genome sequencing and genomics, the current list of experimentally validated transcription factors (TFs) in these genomes is incomplete and mainly consists of AP2 family of proteins, with only a limited number of non-AP2 family TFs and transcription-associated co-factors (TcoFs). We have performed a systematic bioinformatics-aided prediction of TFs and TcoFs in apicomplexan genomes and developed the ApicoTFdb database which consists of experimentally validated as well as computationally predicted TFs and TcoFs in 14 apicomplexan species. The predicted TFs are manually curated to complement the existing annotations. The current version of the database includes 1292 TFs which includes experimentally validated and computationally predicted TFs, representing 20 distinct families across 14 apicomplexan species. The predictions include TFs of TUB, NAC, BSD, HTH, Cupin/Jumonji, winged helix and FHA family proteins, not reported earlier as TFs in the genomes. Apart from TFs, ApicoTFdb also classifies TcoFs into three main subclasses: TRs, CRRs and RNARs, representing 2491 TcoFs in 14 apicomplexan species, are analyzed in this study. The database is designed to integrate different tools for comparative analysis. All entries in the database are dynamically linked with other databases, literature reference, protein–protein interactions, pathways and annotations associated with each protein. ApicoTFdb will be useful to the researchers interested in less-studied gene regulatory mechanisms mediating the complex life cycle of the apicomplexan parasites. The database will aid in the discovery of novel drug targets to much needed combat the growing drug resistance in the parasites.

[1]  B. Contreras-Moreira,et al.  FootprintDB: Analysis of Plant Cis-Regulatory Elements, Transcription Factors, and Binding Interfaces. , 2016, Methods in molecular biology.

[2]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[3]  F. Seeber,et al.  Recent advances in understanding apicomplexan parasites , 2016, F1000Research.

[4]  Manuel Llinás,et al.  The Apicomplexan AP2 family: integral factors regulating Plasmodium development. , 2011, Molecular and biochemical parasitology.

[5]  J. Naggert,et al.  The tubby-like proteins, a family with roles in neuronal development and function. , 2002, Journal of cell science.

[6]  Philippa Rhodes,et al.  CryptoDB: a Cryptosporidium bioinformatics resource update , 2005, Nucleic Acids Res..

[7]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[8]  Rachael P. Huntley,et al.  Gene Ontology annotation of sequence-specific DNA binding transcription factors: setting the stage for a large-scale curation effort , 2013, Database J. Biol. Databases Curation.

[9]  Mark Johnson,et al.  NCBI BLAST: a better web interface , 2008, Nucleic Acids Res..

[10]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[11]  Ge Gao,et al.  PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants , 2016, Nucleic Acids Res..

[12]  Markus Brameier,et al.  BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btm066 Sequence analysis NucPred—Predicting nuclear localization of proteins , 2007 .

[13]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[14]  J Zuegge,et al.  Deciphering apicoplast targeting signals--feature extraction from nuclear-encoded precursors of Plasmodium falciparum apicoplast proteins. , 2001, Gene.

[15]  Vladimir B. Bajic,et al.  TcoF-DB v2: update of the database of human and mouse transcription co-factors and transcription factor interactions , 2016, Nucleic Acids Res..

[16]  Yu Xue,et al.  AnimalTFDB 2.0: a resource for expression, prediction and functional study of animal transcription factors , 2014, Nucleic Acids Res..

[17]  Peer Bork,et al.  SMART 7: recent updates to the protein domain annotation resource , 2011, Nucleic Acids Res..

[18]  Sarah A. Teichmann,et al.  FlyTF: improved annotation and enhanced functionality of the Drosophila transcription factor database , 2009, Nucleic Acids Res..

[19]  R. Kornberg The molecular basis of eukaryotic transcription , 2007, Proceedings of the National Academy of Sciences.

[20]  M. Bulyk,et al.  Transcription factor-DNA binding: beyond binding site motifs. , 2017, Current opinion in genetics & development.

[21]  L. Aravind,et al.  Comparative genomics of transcription factors and chromatin proteins in parasitic protists and other eukaryotes. , 2008, International journal for parasitology.

[22]  Jenn-Kang Hwang,et al.  CELLO2GO: A Web Server for Protein subCELlular LOcalization Prediction with Functional Gene Ontology Annotation , 2014, PloS one.

[23]  T R Hughes,et al.  A catalogue of eukaryotic transcription factor types, their evolutionary origin, and species distribution. , 2011, Sub-cellular biochemistry.

[24]  Chris Mungall,et al.  AmiGO: online access to ontology and annotation data , 2008, Bioinform..

[25]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[26]  T. Hughes,et al.  The Human Transcription Factors , 2018, Cell.

[27]  H. Kohzaki,et al.  Transcription factors and DNA replication origin selection , 2005, BioEssays : news and reviews in molecular, cellular and developmental biology.

[28]  Catherine Vaquero,et al.  In silico and biological survey of transcription-associated proteins implicated in the transcriptional machinery during the erythrocytic development of Plasmodium falciparum , 2010, BMC Genomics.

[29]  P Bucher,et al.  The FHA domain: a putative nuclear signalling domain found in protein kinases and transcription factors. , 1995, Trends in biochemical sciences.

[30]  S. Balaji,et al.  SUPFAM: A database of sequence superfamilies of protein domains , 2004, BMC Bioinformatics.

[31]  J. Reeve Archaeal chromatin and transcription , 2003, Molecular microbiology.

[32]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[33]  Eileen Kraemer,et al.  EuPathDB: a portal to eukaryotic pathogen databases , 2009, Nucleic Acids Res..

[34]  Li Li,et al.  PlasmoDB: the Plasmodium genome resource. A database integrating experimental and computational data , 2003, Nucleic Acids Res..

[35]  Matthew Fraser,et al.  InterProScan 5: genome-scale protein function classification , 2014, Bioinform..

[36]  S. Broschat,et al.  ApicoAP: The First Computational Model for Identifying Apicoplast-Targeted Proteins in Multiple Species of Apicomplexa , 2012, PloS one.

[37]  Mona Singh,et al.  De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins , 2013, Nucleic acids research.

[38]  Debra L. Fulton,et al.  TFCat: the curated catalog of mouse and human transcription factors , 2009, Genome Biology.

[39]  R. Young,et al.  Transcriptional Regulation and Its Misregulation in Disease , 2013, Cell.

[40]  Narmada Thanki,et al.  CDD: NCBI's conserved domain database , 2014, Nucleic Acids Res..

[41]  I. Kobayashi,et al.  Genome-Wide Identification of the Target Genes of AP2-O, a Plasmodium AP2-Family Transcription Factor , 2015, PLoS pathogens.

[42]  Haiming Wang,et al.  ToxoDB: an integrated Toxoplasma gondii database resource , 2007, Nucleic Acids Res..