Moitf GibbsGA: Sampling Transcription Factor Binding Sites Coupled with PSFM Optimization by GA

Identification of transcription factor binding sites (TFBSs) or motifs plays an important role in deciphering the mechanisms of gene regulation. Although many experimental and computational methods have been developed, finding TFBSs remains a challenging problem. We propose and develop a novel sampling based motif finding method coupled with PSFM optimization by genetic algorithm, which we call Motif GibbsGA. One significant feature of Motif GibbsGA is the combination of Gibbs sampling and PSFM optimization by genetic algorithm. Based on position-specific frequency matrix (PSFM) motif model, a greedy strategy for choosing the initial parameters of PSFM is employed. Then a Gibbs sampler is built with respect to PSFM model. During the sampling process, PSFM is improved via a genetic algorithm. A post-processing with adaptive adding and removing is used to handle general cases with arbitrary numbers of instances per sequence. We test our method on the benchmark dataset compiled by Tompa et al. for assessing computational tools that predict TFBSs. The performance of Motif GibbsGA on the data set compares well to, and in many cases exceeds, the performance of existing tools. This is in part attributed to the significant role played by the genetic algorithm which has improved PSFM.

[1]  Siu-Ming Yiu,et al.  MotifVoter: a novel ensemble method for fine-grained integration of generic motif finders , 2008, Bioinform..

[2]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.

[3]  Z. Weng,et al.  Detection of functional DNA motifs via statistical over-representation. , 2004, Nucleic acids research.

[4]  Kathleen Marchal,et al.  A Gibbs sampling method to detect over-represented motifs in the upstream regions of co-expressed genes , 2001, RECOMB.

[5]  Charles Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Mach. Learn..

[6]  Bin Li,et al.  Limitations and potentials of current motif discovery algorithms , 2005, Nucleic acids research.

[7]  Z. Weng,et al.  Finding functional sequence elements by multiple local alignment. , 2004, Nucleic acids research.

[8]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[9]  Michael Gribskov,et al.  Combining evidence using p-values: application to sequence homology searches , 1998, Bioinform..

[10]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[11]  Zhi Wei,et al.  GAME: detecting cis-regulatory elements using a genetic algorithm , 2006, Bioinform..

[12]  Aaron Golden,et al.  Transcription factor binding site identification using the self-organizing map , 2005, Bioinform..

[13]  Marie-France Sagot,et al.  Efficient representation and P-value computation for high-order Markov motifs , 2008, ECCB.

[14]  Arlindo L. Oliveira,et al.  An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance , 2007, BMC Bioinformatics.

[15]  Jie Liu,et al.  GBNet: Deciphering regulatory rules in the co-regulated genes using a Gibbs sampler enhanced Bayesian network approach , 2008, BMC Bioinformatics.

[16]  Xiaodong Wang,et al.  A profile-based deterministic sequential Monte Carlo algorithm for motif discovery , 2008, Bioinform..

[17]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[18]  W. J. Kent,et al.  Environmentally Induced Foregut Remodeling by PHA-4/FoxA and DAF-12/NHR , 2004, Science.

[19]  Kwong-Sak Leung,et al.  TFBS identification based on genetic algorithm with combined representations and adaptive post-processing , 2008, Bioinform..

[20]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[21]  Yu Liang,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm080 Sequence analysis , 2022 .