Quantitative model for genome-wide cyclic AMP receptor protein binding site identification and characteristic analysis

Abstract Cyclic AMP receptor proteins (CRPs) are important transcription regulators in many species. The prediction of CRP-binding sites was mainly based on position-weighted matrixes (PWMs). Traditional prediction methods only considered known binding motifs, and their ability to discover inflexible binding patterns was limited. Thus, a novel CRP-binding site prediction model called CRPBSFinder was developed in this research, which combined the hidden Markov model, knowledge-based PWMs and structure-based binding affinity matrixes. We trained this model using validated CRP-binding data from Escherichia coli and evaluated it with computational and experimental methods. The result shows that the model not only can provide higher prediction performance than a classic method but also quantitatively indicates the binding affinity of transcription factor binding sites by prediction scores. The prediction result included not only the most knowns regulated genes but also 1089 novel CRP-regulated genes. The major regulatory roles of CRPs were divided into four classes: carbohydrate metabolism, organic acid metabolism, nitrogen compound metabolism and cellular transport. Several novel functions were also discovered, including heterocycle metabolic and response to stimulus. Based on the functional similarity of homologous CRPs, we applied the model to 35 other species. The prediction tool and the prediction results are online and are available at: https://awi.cuhk.edu.cn/∼CRPBSFinder.

[1]  OUP accepted manuscript , 2021, Briefings In Bioinformatics.

[2]  Jinwei Zhu,et al.  Characterizing the Binding Sites for GK Domain of DLG1 and DLG4 via Molecular Dynamics Simulation , 2020, Frontiers in Molecular Biosciences.

[3]  De-Shuang Huang,et al.  High-Order Convolutional Neural Network Architecture for Predicting DNA-Protein Binding Sites , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Mohamed Chaabane,et al.  Comprehensive evaluation of deep learning architectures for prediction of DNA/RNA sequence binding specificities , 2019, Bioinform..

[5]  Julio Collado-Vides,et al.  RegulonDB v 10.5: tackling challenges to unify classic and high throughput knowledge of gene regulation in E. coli K-12 , 2018, Nucleic Acids Res..

[6]  Gabriela Bitencourt-Ferreira,et al.  Development of a machine-learning model to predict Gibbs free energy of binding for protein-ligand complexes. , 2018, Biophysical chemistry.

[7]  B. Palsson,et al.  ChIP-exo interrogation of Crp, DNA, and RNAP holoenzyme interactions , 2018, PloS one.

[8]  Hsi-Yuan Huang,et al.  PredCRP: predicting and analysing the regulatory roles of CRP from its binding sites in Escherichia coli , 2018, Scientific Reports.

[9]  Hsi-Yuan Huang,et al.  PredCRP: predicting and analysing the regulatory roles of CRP from its binding sites in Escherichia coli , 2018, Scientific Reports.

[10]  Peter D. Karp,et al.  The EcoCyc database: reflecting new knowledge about Escherichia coli K-12 , 2016, Nucleic Acids Res..

[11]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[12]  N. Jayaram,et al.  Evaluating tools for transcription factor binding site prediction , 2016, BMC Bioinformatics.

[13]  Michael Y. Galperin,et al.  Expanded microbial genome coverage and improved protein family annotation in the COG database , 2014, Nucleic Acids Res..

[14]  Prudence Mutowo-Meullenet,et al.  The GOA database: Gene Ontology annotation updates for 2015 , 2014, Nucleic Acids Res..

[15]  Hsi-Yuan Huang,et al.  CRP represses the CRISPR/Cas system in Escherichia coli: evidence that endogenous CRISPR spacers impede phage P1 replication , 2014, Molecular microbiology.

[16]  Hans V. Westerhoff,et al.  Nitrogen Assimilation in Escherichia coli: Putting Molecular Data into a Systems Perspective , 2013, Microbiology and Molecular Reviews.

[17]  Wyeth W. Wasserman,et al.  The Next Generation of Transcription Factor Binding Site Prediction , 2013, PLoS Comput. Biol..

[18]  Chien-Yu Chen,et al.  PiDNA: predicting protein–DNA interactions with structural models , 2013, Nucleic Acids Res..

[19]  N. Fujita,et al.  Novel Roles of cAMP Receptor Protein (CRP) in Regulation of Transport and Metabolism of Carbon Sources , 2011, PloS one.

[20]  Robert D. Finn,et al.  HMMER web server: interactive sequence similarity searching , 2011, Nucleic Acids Res..

[21]  Narmada Thanki,et al.  CDD: a Conserved Domain Database for the functional annotation of proteins , 2010, Nucleic Acids Res..

[22]  A. Ishihama Prokaryotic genome regulation: multifactor promoters, multitarget regulators and hierarchic networks. , 2010, FEMS microbiology reviews.

[23]  H. Won,et al.  Structural overview on the allosteric activation of cyclic AMP receptor protein. , 2009, Biochimica et biophysica acta.

[24]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[25]  Zhengchang Su,et al.  Computational prediction of cAMP receptor protein (CRP) binding sites in cyanobacterial genomes , 2009, BMC Genomics.

[26]  S. Kędracka-Krok,et al.  cAMP Receptor Protein from Escherichia coli as a Model of Signal Transduction in Proteins – A Review , 2008, Journal of Molecular Microbiology and Biotechnology.

[27]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[28]  David Bryant,et al.  DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists , 2007, Nucleic Acids Res..

[29]  Jolyon Holdstock,et al.  Studies of the distribution of Escherichia coli cAMP-receptor protein and RNA polymerase along the E. coli chromosome. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[30]  T. Inada,et al.  Implication of membrane localization of target mRNA in the action of a small RNA: mechanism of post-transcriptional regulation of glucose transporter in Escherichia coli. , 2005, Genes & development.

[31]  Milton H. Saier,et al.  Transcriptome Analysis of Crp-Dependent Catabolite Control of Gene Expression in Escherichia coli , 2004, Journal of bacteriology.

[32]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[33]  D. Swigon,et al.  Catabolite activator protein: DNA binding and transcription activation. , 2004, Current opinion in structural biology.

[34]  Chrystala Constantinidou,et al.  Identification of the CRP regulon using in vitro and in vivo transcriptional profiling. , 2004, Nucleic acids research.

[35]  Heidi J Sofia,et al.  Phylogeny of the bacterial superfamily of Crp-Fnr transcription regulators: exploiting the metabolic spectrum by controlling alternative gene programs. , 2003, FEMS microbiology reviews.

[36]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[37]  J. Collado-Vides,et al.  Identifying global regulators in transcriptional regulatory networks in bacteria. , 2003, Current opinion in microbiology.

[38]  Alexander E. Kel,et al.  MATCHTM: a tool for searching transcription factor binding sites in DNA sequences , 2003, Nucleic Acids Res..

[39]  T. Inada,et al.  Expression of the glucose transporter gene, ptsG, is regulated at the mRNA degradation step in response to glycolytic flux in Escherichia coli , 2001, The EMBO journal.

[40]  J G Harman,et al.  Allosteric regulation of the cAMP receptor protein. , 2001, Biochimica et biophysica acta.

[41]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[42]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[43]  R. Ebright,et al.  Transcription activation by catabolite activator protein (CAP). , 1999, Journal of molecular biology.

[44]  M. Saier,et al.  Multiple mechanisms controlling carbon metabolism in bacteria. , 1998, Biotechnology and bioengineering.

[45]  H. Aiba,et al.  A common role of CRP in transcription activation: CRP acts transiently to stimulate events leading to open complex formation at a diverse set of promoters , 1998, The EMBO journal.

[46]  R. Ebright,et al.  Transcription Activation at Class II CAP-Dependent Promoters: Two Interactions between CAP and RNA Polymerase , 1996, Cell.

[47]  J. Lee,et al.  Mode of selectivity in cyclic AMP receptor protein-dependent promoters in Escherichia coli. , 1996, Biochemistry.

[48]  S. Adhya,et al.  The galactose regulon of Escherichia coli , 1993, Molecular microbiology.

[49]  H. Buc,et al.  Transcriptional regulation by cAMP and its receptor protein. , 1993, Annual review of biochemistry.

[50]  D. Crothers,et al.  Sequence-dependent contribution of distal binding domains to CAP protein-DNA binding affinity. , 1991, Nucleic acids research.

[51]  R. Utsumi,et al.  Control mechanism of the Escherichia coli K-12 cell cycle is triggered by the cyclic AMP-cyclic AMP receptor protein complex , 1989, Journal of bacteriology.

[52]  A. Gronenborn,et al.  The binding of the cyclic AMP receptor protein to synthetic DNA sites containing permutations in the consensus sequence TGTGA. , 1987, The Biochemical journal.

[53]  M. E. Gent,et al.  Probing the sequence-specific interaction of the cyclic AMP receptor protein with DNA by site-directed mutagenesis. , 1987, The Biochemical journal.

[54]  Thomas A. Steitz,et al.  Structure of catabolite gene activator protein at 2.9 Å resolution suggests binding to left-handed B-DNA , 1981, Nature.

[55]  J. Beckwith,et al.  Mechanism of activation of catabolite-sensitive genes: a positive control system. , 1970, Proceedings of the National Academy of Sciences of the United States of America.