High DNA melting temperature predicts transcription start site location in human and mouse

The accurate computational prediction of transcription start sites (TSS) in vertebrate genomes is a difficult problem. The physicochemical properties of DNA can be computed in various ways and a many combinations of DNA features have been tested in the past for use as predictors of transcription. We looked in detail at melting temperature, which measures the temperature, at which two strands of DNA separate, considering the cooperative nature of this process. We find that peaks in melting temperature correspond closely to experimentally determined transcription start sites in human and mouse chromosomes. Using melting temperature alone, and with simple thresholding, we can predict TSS with accuracy that is competitive with the most accurate state-of-the-art TSS prediction methods. Accuracy is measured using both experimentally and manually determined TSS. The method works especially well with CpG island containing promoters, but also works when CpG islands are absent. This result is clear evidence of the important role of the physical properties of DNA in the process of transcription. It also points to the importance for TSS prediction methods to include melting temperature as prior information.

[1]  E. Yeramian,et al.  The physics of DNA and the annotation of the Plasmodium falciparum genome. , 2000, Gene.

[2]  Israel Steinfeld,et al.  Developmental programming of CpG island methylation profiles in the human genome , 2009, Nature Structural &Molecular Biology.

[3]  A. El May,et al.  The effect of methylation on some biological parameters in Salmonella enterica serovar Typhimurium. , 2011, Pathologie-biologie.

[4]  Sang Wook Yoo,et al.  Toward a Detailed Description of the Thermally Induced Dynamics of the Core Promoter , 2009, PLoS Comput. Biol..

[5]  Michael Q. Zhang,et al.  High-resolution human core-promoter prediction with CoreBoost_HM. , 2009, Genome research.

[6]  Yvan Saeys,et al.  Generic eukaryotic core promoter prediction using structural features of DNA. , 2008, Genome research.

[7]  Gunnar Rätsch,et al.  ARTS: accurate recognition of transcription starts in human , 2006, ISMB.

[8]  C. Kai,et al.  CAGE: cap analysis of gene expression , 2006, Nature Methods.

[9]  G. Steger,et al.  Thermal denaturation of double-stranded nucleic acids: prediction of temperatures critical for gradient gel electrophoresis and polymerase chain reaction. , 1994, Nucleic acids research.

[10]  Edouard Yeramian,et al.  Physics-based gene identification: proof of concept for Plasmodium falciparum , 2002, Bioinform..

[11]  Christopher M. Bishop,et al.  Variational Relevance Vector Machines , 2000, UAI.

[12]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[13]  E. Yeramian,et al.  Genes and the physics of the DNA double-helix. , 2000, Gene.

[14]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[15]  Michael Ruogu Zhang,et al.  Computational identification of promoters and first exons in the human genome , 2002, Nature Genetics.

[16]  Jun Kawai,et al.  CAGE Basic/Analysis Databases: the CAGE resource for comprehensive promoter analysis , 2005, Nucleic Acids Res..

[17]  Yvan Saeys,et al.  Toward a gold standard for promoter prediction evaluation , 2009, Bioinform..

[18]  L. Lerman,et al.  Computational simulation of DNA melting and its application to denaturing gradient gel electrophoresis. , 1987, Methods in enzymology.

[19]  Michael R. Brent,et al.  Using Multiple Alignments to Improve Gene Prediction , 2005, RECOMB.

[20]  C. Gautier,et al.  The GC-heterogeneity of teleost fishes , 2008, BMC Genomics.

[21]  C. Benham,et al.  Sites of predicted stress-induced DNA duplex destabilization occur preferentially at regulatory loci. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Huiquan Wang,et al.  Promoter prediction and annotation of microbial genomes based on DNA sequence and structural responses to superhelical stress , 2006, BMC Bioinformatics.

[23]  Fang Liu,et al.  The Human Genomic Melting Map , 2007, PLoS Comput. Biol..

[24]  Yvan Saeys,et al.  ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles , 2008, ISMB.

[25]  Affymetrix Encode Transcriptome Post-transcriptional processing generates a diversity of 5'-modified long and short RNAs. , 2009 .

[26]  Leng Han,et al.  Features and trend of loss of promoter-associated CpG islands in the human and mouse genomes. , 2007, Molecular biology and evolution.

[27]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[28]  T. Hubbard,et al.  Computational detection and location of transcription start sites in mammalian genomic DNA. , 2002, Genome research.

[29]  Gene W. Yeo,et al.  Divergent Transcription from Active Promoters , 2008, Science.

[30]  Modesto Orozco,et al.  Determining promoter location based on DNA structure first-principles calculations , 2007, Genome Biology.

[31]  M. Frommer,et al.  CpG islands in vertebrate genomes. , 1987, Journal of molecular biology.

[32]  Douglas Poland,et al.  Theory of helix-coil transitions in biopolymers , 1970 .

[33]  M. Fixman,et al.  Theory of DNA melting curves , 1977, Biopolymers.

[34]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[35]  Yusaku Tagashira,et al.  Stabilities of nearest‐neighbor doublets in double‐helical DNA determined by fitting calculated melting profiles to observed profiles , 1981 .

[36]  Manju Bansal,et al.  Structural properties of promoters: similarities and differences between prokaryotes and eukaryotes , 2005, Nucleic acids research.

[37]  J. Harrow,et al.  GENCODE: producing a reference annotation for ENCODE , 2006, Genome Biology.

[38]  Huiquan Wang,et al.  Superhelical Destabilization in Regulatory Regions of Stress Response Genes , 2008, PLoS Comput. Biol..

[39]  Uwe Ohler,et al.  Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment , 2006, Genome Biology.