Moitf GibbsGA: Sampling Transcription Factor Binding Sites Coupled with PSFM Optimization by Genetic Algorithm

Identification of transcription factor binding sites (TFBSs) or motifs plays an important role in deciphering the mechanisms of gene regulation. Although many experimental and computational methods have been developed, finding TFBSs remains a challenging problem. We propose and develop a novel sampling based motif finding method coupled with PSFM optimization by genetic algorithm, which we call Motif GibbsGA. One significant feature of Motif GibbsGA is the combination of a Gibbs sampling method and a PSFM optimization by genetic algorithm. Based on position-specific frequency matrix (PSFM) motif model, a greedy strategy for choosing the initial parameters of PSFM is employed. Then a Gibbs sampler is build with respect to PSFM model. During the sampling process, PSFM is improved via a genetic algorithm. A post-processing with adaptive adding and removing is used to handle general cases with arbitrary numbers of instances per sequence. So Motif GibbsGA is capable of discovering several different motifs with differing numbers of occurrences in a single dataset. We test our method on the benchmark dataset compiled by Tompa et al. (2005) for assessing computational tools that predict TFBSs. The performance of Motif GibbsGA on this data set compares well to, and in many cases exceeds, the performance of existing tools. This is in part attributed to the significant role played by the genetic algorithm that improved PSFM.

[1]  Arlindo L. Oliveira,et al.  An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance , 2007, BMC Bioinformatics.

[2]  Bin Li,et al.  Limitations and potentials of current motif discovery algorithms , 2005, Nucleic acids research.

[3]  Jie Liu,et al.  GBNet: Deciphering regulatory rules in the co-regulated genes using a Gibbs sampler enhanced Bayesian network approach , 2008, BMC Bioinformatics.

[4]  Z. Weng,et al.  Finding functional sequence elements by multiple local alignment. , 2004, Nucleic acids research.

[5]  Xiaodong Wang,et al.  A profile-based deterministic sequential Monte Carlo algorithm for motif discovery , 2008, Bioinform..

[6]  C. Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Machine Learning.

[7]  Aaron Golden,et al.  Transcription factor binding site identification using the self-organizing map , 2005, Bioinform..

[8]  Siu-Ming Yiu,et al.  MotifVoter: a novel ensemble method for fine-grained integration of generic motif finders , 2008, Bioinform..

[9]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.

[10]  Marie-France Sagot,et al.  Efficient representation and P-value computation for high-order Markov motifs , 2008, ECCB.

[11]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[12]  Kwong-Sak Leung,et al.  TFBS identification based on genetic algorithm with combined representations and adaptive post-processing , 2008, Bioinform..

[13]  Michael Gribskov,et al.  Combining evidence using p-values: application to sequence homology searches , 1998, Bioinform..

[14]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[15]  Kathleen Marchal,et al.  A Gibbs sampling method to detect over-represented motifs in the upstream regions of co-expressed genes , 2001, RECOMB.

[16]  Zhi Wei,et al.  GAME: detecting cis-regulatory elements using a genetic algorithm , 2006, Bioinform..

[17]  Z. Weng,et al.  Detection of functional DNA motifs via statistical over-representation. , 2004, Nucleic acids research.

[18]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[19]  W. J. Kent,et al.  Environmentally Induced Foregut Remodeling by PHA-4/FoxA and DAF-12/NHR , 2004, Science.

[20]  Jiao Licheng,et al.  Moitf GibbsGA: Sampling Transcription Factor Binding Sites Coupled with PSFM Optimization by Genetic Algorithm , 2010 .

[21]  G. Clark,et al.  Reference , 2008 .

[22]  Yu Liang,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm080 Sequence analysis , 2022 .