A New Strategy for Pridicting Eukaryotic Promoter Based on Feature Boosting

Computational prediction of eukaryotic promoter is one of most elusive problems in DNA sequence analysis. Although considerable efforts have been devoted to this study and a number of algorithms have been developed in the last few years, their performances still need to further improve. In this work, we developed a new algorithm called PPFB for promoter prediction base on following hypothesis: promoter is determined by some motifs or word patterns and different promoters are determined by different motifs. We select most potential motifs (i.e. features) by divergence distance between two classes and constructed a classifier by feature boosting. Different from other classifier, we adopted a different training and classifying strategy. Computational results on large genomic sequences and comparisons with the several excellent algorithms showed that our method is efficient with better sensitivity and specificity.

[1]  Dominique Mouchiroud,et al.  CpGProD: identifying CpG islands associated with transcription start sites in large genomic mammalian sequences , 2002, Bioinform..

[2]  G. B. Hutchinson,et al.  The prediction of vertebrate promoter regions using differential hexamer frequency analysis , 1996, Comput. Appl. Biosci..

[3]  C Burks,et al.  The density of transcriptional elements in promoter and non-promoter sequences. , 1993, Human molecular genetics.

[4]  Hong Yan,et al.  Eukaryotic promoter prediction based on relative entropy and positional information. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  T. Werner,et al.  Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach. , 2000, Journal of molecular biology.

[6]  S. Cross,et al.  Isolation of CpG islands from large genomic clones. , 1999, Nucleic acids research.

[7]  Vladimir Brusic,et al.  Computer model for recognition of functional transcription start sites in RNA polymerase II promoters of vertebrates. , 2003, Journal of molecular graphics & modelling.

[8]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[9]  T. Hubbard,et al.  Computational detection and location of transcription start sites in mammalian genomic DNA. , 2002, Genome research.

[10]  Michael I. Posner,et al.  Cognition (2nd ed.). , 1987 .

[11]  Michael Q. Zhang,et al.  Computational identification of promoters and first exons in the human genome , 2001, Nature Genetics.

[12]  Michael Q. Zhang,et al.  Large-scale human promoter mapping using CpG islands , 2000, Nature Genetics.