CUBIC: search for binding sites
暂无分享,去创建一个
The regulation of gene transcription is achieved through specific interactions between transcription factors and their binding sites in the upstream region of the gene being regulated. Correct identification of these binding sites represents a key challenging problem in computational biology. Our approach to the problem is to find a "clear" cluster in the space of all k-mers from the upstream regulatory regions of a set of genes that potentially share similar binding sites. The cluster identification is performed by using minimal spanning tree (MST) technique with a special distance between k-mers based on the chosen profile. It's shown that widely used "conservation" characteristic in position is a result of a "common sense" requirement for "conservation". The local convergence of algorithm for "conservation" maximization of profile has been proved and the method for statistical significance evaluation of results is presented. All ideas have been implemented in a form of software CUBIC.
[1] Gary D. Stormo,et al. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..
[2] Michael Gribskov,et al. Methods and Statistics for Combining Motif Match Scores , 1998, J. Comput. Biol..
[3] Ying Xu,et al. Cubic: Identification of Regulatory Binding Sites through Data Clustering , 2003, J. Bioinform. Comput. Biol..
[4] Jun S. Liu,et al. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.