PFP: A Computational Framework for Phylogenetic Footprinting in Prokaryotic Genomes

Phylogenetic footprinting is a widely used approach for theprediction of transcription factor binding sites (TFBSs) through identificationof conserved motifs in the upstream sequences of orthologousgenes in eukaryotic genomes. However, this popular strategy may notbe directly applicable to prokaryotic genomes, where typically abouthalf of the genes in a genome form multiple-gene transcription unitsor operons. The promoter sequences for these operons are located in theinter-operonic rather than inter-genic regions, which require prediction ofTFBSs at the transcriptional unit instead of individual gene level. Wehave formulated as a bipartite graph matching problem the identificationof conserved operons (including both single-gene and multi-gene operons)whose individual gene members are orthologous between two genomesand present a graph-theoretic solution. By applying this method to Escherichiacoli K12 and 11 of its phylogeneticly neighboring species, wehave predicted 2, 478 sets of conserved operons, and discovered potentialbinding motifs for each of these operons. By comparing the predictionresults of our approach and other prediction approaches, we concludethat it is advantageous to use our approach for prediction of cis regulatorybinding sites in prokaryotes. The prediction software package PFPis available at http://csbl.bmb.uga.edu/~dongsheng/PFP.

[1]  Martin Tompa,et al.  MicroFootPrinter: a tool for phylogenetic footprinting in prokaryotic genomes , 2006, Nucleic Acids Res..

[2]  Ying Xu,et al.  Accurate prediction of orthologous gene groups in microbes , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[3]  G. Church,et al.  Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation , 1998, Nature Biotechnology.

[4]  G. Church,et al.  Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes. , 2000, Genome research.

[5]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.

[6]  Shane T. Jensen,et al.  BioOptimizer: a Bayesian scoring function approach to motif discovery , 2004, Bioinform..

[7]  Liming Cai,et al.  BEST: Binding-site Estimation Suite of Tools , 2005, Bioinform..

[8]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[9]  Ying Xu,et al.  Operon prediction using both genome-specific and general genomic information , 2006, Nucleic acids research.

[10]  Bin Li,et al.  Limitations and potentials of current motif discovery algorithms , 2005, Nucleic acids research.

[11]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[12]  Liming Cai,et al.  Operon Prediction in Microbial Genomes Using Decision Tree Approach , 2007, 2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology.

[13]  Lei Shen,et al.  Combining phylogenetic motif discovery and motif clustering to predict co-regulated genes , 2005, Bioinform..

[14]  Ting Wang,et al.  Identifying the conserved network of cis-regulatory sites of a eukaryotic genome. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Jeremy Buhler,et al.  Operon prediction without a training set , 2005, Bioinform..

[16]  G D Stormo,et al.  A comparative genomics approach to prediction of new members of regulons. , 2001, Genome research.

[17]  S. Salzberg,et al.  Prediction of operons in microbial genomes. , 2001, Nucleic acids research.

[18]  Kurt Mehlhorn,et al.  LEDA: a platform for combinatorial and geometric computing , 1997, CACM.

[19]  C. Lawrence,et al.  Factors influencing the identification of transcription factor binding sites by cross-species comparison. , 2002, Genome research.

[20]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[21]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[22]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[23]  M. Goodman,et al.  Embryonic epsilon and gamma globin genes of a prosimian primate (Galago crassicaudatus). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. , 1988, Journal of molecular biology.

[24]  M. Goodman,et al.  Embryonic ε and γ globin genes of a prosimian primate (Galago crassicaudatus): Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints , 1988 .

[25]  J. Liu,et al.  Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. , 2001, Nucleic acids research.

[26]  Katherine H. Huang,et al.  A novel method for accurate operon predictions in all sequenced prokaryotes , 2005, Nucleic acids research.

[27]  Dieter Jahn,et al.  Virtual Footprint and PRODORIC: an integrative framework for regulon prediction in prokaryotes , 2005, Bioinform..