Towards accurate human promoter recognition: a review of currently used sequence features and classification methods

This review describes important advances that have been made during the past decade for genome-wide human promoter recognition. Interest in promoter recognition algorithms on a genome-wide scale is worldwide and touches on a number of practical systems that are important in analysis of gene regulation and in genome annotation without experimental support of ESTs, cDNAs or mRNAs. The main focus of this review is on feature extraction and model selection for accurate human promoter recognition, with descriptions of what they are, what has been accomplished, and what remains to be done.

[1]  Hong Yan,et al.  SCS: Signal, Context, and Structure Features for Genome-Wide Human Promoter Recognition , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Hong Yan,et al.  Structural properties of replication origins in yeast DNA sequences , 2008, Physical biology.

[3]  Yvan Saeys,et al.  ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles , 2008, ISMB.

[4]  Hong Yan,et al.  PCA-HPR: A principle component analysis model for human promoter recognition , 2008, Bioinformation.

[5]  Hong Yan,et al.  Structural property of regulatory elements in human promoters. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Olivier Bodenreider,et al.  The biological function of some human transcription factor binding motifs varies with position relative to the transcription start site , 2008, Nucleic acids research.

[7]  Hong-Hee Won,et al.  EnsemPro: an ensemble approach to predicting transcription start sites in human genomic DNA sequences. , 2008, Genomics.

[8]  Panos Deloukas,et al.  DNA sequence and structural properties as predictors of human and mouse promoters , 2008, Gene.

[9]  Yvan Saeys,et al.  Generic eukaryotic core promoter prediction using structural features of DNA. , 2008, Genome research.

[10]  Modesto Orozco,et al.  Determining promoter location based on DNA structure first-principles calculations , 2007, Genome Biology.

[11]  Andreas Prlic,et al.  Ensembl 2008 , 2007, Nucleic Acids Res..

[12]  Sridhar Hannenhalli,et al.  MetaProm: a neural network based meta-predictor for alternative human promoter prediction , 2007, BMC Genomics.

[13]  Kenta Nakai,et al.  DBTSS: database of transcription start sites, progress report 2008 , 2007, Nucleic Acids Res..

[14]  Michael Q. Zhang Computational analyses of eukaryotic promoters , 2007, BMC Bioinformatics.

[15]  Satoshi Fujii,et al.  Sequence-dependent DNA deformability studied using molecular dynamics simulations , 2007, Nucleic acids research.

[16]  T. Mikkelsen,et al.  Genome-wide maps of chromatin state in pluripotent and lineage-committed cells , 2007, Nature.

[17]  S. Hannenhalli,et al.  Position and distance specificity are important determinants of cis-regulatory motifs in addition to evolutionary conservation , 2007, Nucleic acids research.

[18]  Hong Yan,et al.  Eukaryotic promoter prediction based on relative entropy and positional information. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Michael Q. Zhang,et al.  Boosting with stumps for predicting transcription start sites , 2007, Genome Biology.

[20]  Hong Yan,et al.  PromoterExplorer: an effective promoter identification method based on the AdaBoost algorithm , 2006, Bioinform..

[21]  Sridhar Hannenhalli,et al.  A mammalian promoter model links cis elements to genetic networks. , 2006, Biochemical and biophysical research communications.

[22]  Uwe Ohler,et al.  Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment , 2006, Genome Biology.

[23]  V. Solovyev,et al.  Automatic annotation of eukaryotic genes, pseudogenes and promoters , 2006, Genome Biology.

[24]  E. Birney,et al.  EGASP: the human ENCODE Genome Annotation Assessment Project , 2006, Genome Biology.

[25]  Gunnar Rätsch,et al.  ARTS: accurate recognition of transcription starts in human , 2006, ISMB.

[26]  Martin S. Taylor,et al.  Genome-wide analysis of mammalian promoter architecture and evolution , 2006, Nature Genetics.

[27]  D. Brutlag,et al.  A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Philipp Bucher,et al.  EPD in its twentieth year: towards complete promoter coverage of selected model organisms , 2005, Nucleic Acids Res..

[29]  Xin Chen,et al.  TiProD: the Tissue-specific Promoter Database , 2005, Nucleic Acids Res..

[30]  G. Stormo,et al.  Combining SELEX with quantitative assays to rapidly obtain accurate models of protein–DNA interactions , 2005, Nucleic acids research.

[31]  Yoshiro Fukue,et al.  A highly distinctive mechanical property found in the majority of human promoters and its transcriptional relevance , 2005, Nucleic acids research.

[32]  Naum I. Gershenzon,et al.  Synergy of human Pol II core promoter elements revealed by statistical sequence analysis , 2005, Bioinform..

[33]  R. Gangal,et al.  Human pol II promoter prediction: time series descriptors and machine learning , 2005, Nucleic acids research.

[34]  S. Burden,et al.  Improving promoter prediction Improving promoter prediction for the NNPP2.2 algorithm: a case study using Escherichia coli DNA sequences , 2005, Bioinform..

[35]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[36]  Graziano Pesole,et al.  UTRdb and UTRsite: a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs , 2004, Nucleic Acids Res..

[37]  Sin Lam Tan,et al.  Promoter prediction analysis on the whole human genome , 2004, Nature Biotechnology.

[38]  J. T. Kadonaga,et al.  The RNA polymerase II core promoter. , 2003, Annual review of biochemistry.

[39]  Seng Hong Seah,et al.  Dragon gene start finder: an advanced system for finding approximate locations of the start of gene transcriptional units. , 2003, Genome research.

[40]  Thomas Werner,et al.  The State of the Art of Mammalian Promoter Recognition , 2003, Briefings Bioinform..

[41]  Dominique Mouchiroud,et al.  CpGProD: identifying CpG islands associated with transcription start sites in large genomic mammalian sequences , 2002, Bioinform..

[42]  Daiya Takai,et al.  Comprehensive analysis of CpG islands in human chromosomes 21 and 22 , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[43]  T. Hubbard,et al.  Computational detection and location of transcription start sites in mammalian genomic DNA. , 2002, Genome research.

[44]  Heinrich Niemann,et al.  Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition , 2001, ISMB.

[45]  H Niemann,et al.  Identification and analysis of eukaryotic promoters: recent computational approaches. , 2001, Trends in genetics : TIG.

[46]  T. Werner,et al.  Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach. , 2000, Journal of molecular biology.

[47]  C. Hunter,et al.  Sequence-dependent DNA structure: tetranucleotide conformational maps. , 2000, Journal of molecular biology.

[48]  Pierre Baldi,et al.  The Biology of Eukaryotic Promoter Prediction - A Review , 1999, Comput. Chem..

[49]  Steen Knudsen,et al.  Promoter2.0: for the recognition of PolII promoter sequences , 1999, Bioinform..

[50]  P. Baldi,et al.  DNA structure in human RNA polymerase II promoters. , 1998, Journal of molecular biology.

[51]  J. Fickett,et al.  Eukaryotic promoter recognition. , 1997, Genome research.

[52]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[53]  Michael Q. Zhang,et al.  High-resolution human core-promoter prediction with CoreBoost_HM. , 2009, Genome research.

[54]  Sanghamitra Bandyopadhyay,et al.  Prediction of transcription start sites based on feature selection using AMOSA. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[55]  S. Burden,et al.  Improving promoter prediction for the NNPP2.2 algorithm: a case study using Escherichia coli DNA sequences. , 2005, Bioinformatics.

[56]  Vladimir B. Bajic,et al.  Dragon Promoter Finder: recognition of vertebrate RNA polymerase II promoters , 2002, Bioinform..

[57]  Michael Ruogu Zhang,et al.  Computational identification of promoters and first exons in the human genome , 2002, Nature Genetics.

[58]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[59]  Iraj Daizadeh,et al.  EID: the Exon?Intron Database?an exhaustive database of protein-coding intron-containing genes , 2000, Nucleic Acids Res..

[60]  David G. Stork,et al.  Pattern Classification , 1973 .