High-resolution human core-promoter prediction with CoreBoost_HM.

Correctly locating the gene transcription start site and the core-promoter is important for understanding transcriptional regulation mechanism. Here we have integrated specific genome-wide histone modification and DNA sequence features together to predict RNA polymerase II core-promoters in the human genome. Our new predictor CoreBoost_HM outperforms existing promoter prediction algorithms by providing significantly higher sensitivity and specificity at high resolution. We demonstrated that even though the histone modification data used in this study are from a specific cell type (CD4+ T-cell), our method can be used to identify both active and repressed promoters. We have applied it to search the upstream regions of microRNA genes, and show that CoreBoost_HM can accurately identify the known promoters of the intergenic microRNAs. We also identified a few intronic microRNAs that may have their own promoters. This result suggests that our new method can help to identify and characterize the core-promoters of both coding and noncoding genes.

[1]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[2]  Megan F. Cole,et al.  Connecting microRNA Genes to the Core Transcriptional Regulatory Circuitry of Embryonic Stem Cells , 2008, Cell.

[3]  Michael Q. Zhang,et al.  Combinatorial patterns of histone acetylations and methylations in the human genome , 2008, Nature Genetics.

[4]  Dustin E. Schones,et al.  Dynamic Regulation of Nucleosome Positioning in the Human Genome , 2008, Cell.

[5]  Dustin E. Schones,et al.  Genome-wide approaches to studying chromatin modifications , 2008, Nature Reviews Genetics.

[6]  Yvan Saeys,et al.  Generic eukaryotic core promoter prediction using structural features of DNA. , 2008, Genome research.

[7]  Michael Q. Zhang,et al.  Identification of phylogenetically conserved microRNA cis-regulatory elements across 12 Drosophila species , 2008, Bioinform..

[8]  Kenta Nakai,et al.  DBTSS: database of transcription start sites, progress report 2008 , 2007, Nucleic Acids Res..

[9]  Anton J. Enright,et al.  Genomic analysis of human microRNA transcripts , 2007, Proceedings of the National Academy of Sciences.

[10]  Michael Q. Zhang Computational analyses of eukaryotic promoters , 2007, BMC Bioinformatics.

[11]  T. Mikkelsen,et al.  Genome-wide maps of chromatin state in pluripotent and lineage-committed cells , 2007, Nature.

[12]  C. Sander,et al.  A Mammalian microRNA Expression Atlas Based on Small RNA Library Sequencing , 2007, Cell.

[13]  Michael A. Beer,et al.  Transactivation of miR-34a by p53 broadly influences gene expression and promotes apoptosis. , 2007, Molecular cell.

[14]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[15]  Yoko Fukuda,et al.  An Evolutionarily Conserved Mechanism for MicroRNA-223 Expression Revealed by MicroRNA Gene Profiling , 2007, Cell.

[16]  Nathaniel D. Heintzman,et al.  Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome , 2007, Nature Genetics.

[17]  T. Kouzarides Chromatin Modifications and Their Function , 2007, Cell.

[18]  J. M. Thomson,et al.  Direct Regulation of an Oncogenic Micro-RNA Cluster by E2F Transcription Factors* , 2007, Journal of Biological Chemistry.

[19]  Weixiong Zhang,et al.  Characterization and Identification of MicroRNA Core Promoters in Four Model Species , 2007, PLoS Comput. Biol..

[20]  Michael Q. Zhang,et al.  Boosting with stumps for predicting transcription start sites , 2007, Genome Biology.

[21]  B. Davidson,et al.  RNA polymerase III transcribes human microRNAs , 2006, Nature Structural &Molecular Biology.

[22]  Suresh Cuddapah,et al.  The genomic landscape of histone modifications in human T cells , 2006, Proceedings of the National Academy of Sciences.

[23]  Uwe Ohler,et al.  Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment , 2006, Genome Biology.

[24]  Martin S. Taylor,et al.  Genome-wide analysis of mammalian promoter architecture and evolution , 2006, Nature Genetics.

[25]  D. Brutlag,et al.  A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Stijn van Dongen,et al.  miRBase: microRNA sequences, targets and gene nomenclature , 2005, Nucleic Acids Res..

[27]  Rudolf Jaenisch,et al.  Characterization of a highly variable eutherian microRNA gene. , 2005, RNA.

[28]  Michael Q. Zhang,et al.  Genome-wide promoter extraction and analysis in human, mouse, and rat , 2005, Genome Biology.

[29]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[30]  B. Cullen,et al.  Human microRNAs are processed from capped, polyadenylated transcripts that can also function as mRNAs. , 2004, RNA.

[31]  Sin Lam Tan,et al.  Promoter prediction analysis on the whole human genome , 2004, Nature Biotechnology.

[32]  Sanghyuk Lee,et al.  MicroRNA genes are transcribed by RNA polymerase II , 2004, The EMBO journal.

[33]  A. Bradley,et al.  Identification of mammalian microRNA host genes and transcription units. , 2004, Genome research.

[34]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[35]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[36]  Michael Ruogu Zhang,et al.  Computational identification of promoters and first exons in the human genome , 2002, Nature Genetics.

[37]  Vladimir B. Bajic,et al.  Dragon Promoter Finder: recognition of vertebrate RNA polymerase II promoters , 2002, Bioinform..

[38]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[39]  Michael Q. Zhang,et al.  Computational identification of promoters and first exons in the human genome , 2001, Nature Genetics.

[40]  Heinrich Niemann,et al.  Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition , 2001, ISMB.

[41]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[42]  C. Hunter,et al.  Sequence-dependent DNA structure: tetranucleotide conformational maps. , 2000, Journal of molecular biology.

[43]  Philipp Bucher,et al.  The Eukaryotic Promoter Database (EPD) , 2000, Nucleic Acids Res..

[44]  M. Q. Zhang,et al.  Identification of human gene core promoters in silico. , 1998, Genome research.

[45]  Philipp Bucher,et al.  The Eukaryotic Promoter Database EPD , 1998, Nucleic Acids Res..

[46]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .