DNA Free Energy-Based Promoter Prediction and Comparative Analysis of Arabidopsis and Rice Genomes1[C][W][OA]

The cis-regulatory regions on DNA serve as binding sites for proteins such as transcription factors and RNA polymerase. The combinatorial interaction of these proteins plays a crucial role in transcription initiation, which is an important point of control in the regulation of gene expression. We present here an analysis of the performance of an in silico method for predicting cis-regulatory regions in the plant genomes of Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) on the basis of free energy of DNA melting. For protein-coding genes, we achieve recall and precision of 96% and 42% for Arabidopsis and 97% and 31% for rice, respectively. For noncoding RNA genes, the program gives recall and precision of 94% and 75% for Arabidopsis and 95% and 90% for rice, respectively. Moreover, 96% of the false-positive predictions were located in noncoding regions of primary transcripts, out of which 20% were found in the first intron alone, indicating possible regulatory roles. The predictions for orthologous genes from the two genomes showed a good correlation with respect to prediction scores and promoter organization. Comparison of our results with an existing program for promoter prediction in plant genomes indicates that our method shows improved prediction capability.

[1]  Hongya Zhao,et al.  Finding human promoter groups based on DNA physical properties. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  J. Russell,et al.  RNA-polymerase-I-directed rDNA transcription, life and works. , 2005, Trends in biochemical sciences.

[3]  T. Hall,et al.  Rice Triosephosphate Isomerase Gene 5[prime] Sequence Directs [beta]-Glucuronidase Activity in Transgenic Tobacco but Requires an Intron for Expression in Rice , 1994, Plant physiology.

[4]  T. Hall,et al.  Intron position affects expression from thetpi promoter in rice , 1996, Plant Molecular Biology.

[5]  Ronald W. Davis,et al.  A high-resolution atlas of nucleosome occupancy in yeast , 2007, Nature Genetics.

[6]  J. Callis,et al.  The intron of Arabidopsis thaliana polyubiquitin genes is conserved in location and is a quantitative determinant of chimeric gene expression , 1993, Plant Molecular Biology.

[7]  Manju Bansal,et al.  Identification and annotation of promoter regions in microbial genome sequences on the basis of DNA stability , 2007, Journal of Biosciences.

[8]  D. Gonzalez,et al.  The leader intron of Arabidopsis thaliana genes encoding cytochrome c oxidase subunit 5c promotes high-level expression by increasing transcript abundance and translation efficiency. , 2005, Journal of experimental botany.

[9]  T. Sakurai,et al.  Genome sequence of the palaeopolyploid soybean , 2010, Nature.

[10]  T. Sakurai,et al.  Identification of plant promoter constituents by analysis of local distribution of short sequences , 2007, BMC Genomics.

[11]  Dawei Huang,et al.  A Novel Role for Minimal Introns: Routing mRNAs to the Cytosol , 2010, PloS one.

[12]  Manju Bansal,et al.  Relative stability of DNA as a generic criterion for promoter prediction: whole genome annotation of microbial genomes with varying nucleotide base composition. , 2009, Molecular bioSystems.

[13]  P. Farnham Insights from genomic profiling of transcription factors , 2009, Nature Reviews Genetics.

[14]  M. Morikawa,et al.  Common mechanisms regulating expression of rice aleurone genes that contribute to the primary response for gibberellin. , 2006, Biochimica et biophysica acta.

[15]  J. Bennetzen,et al.  The Physcomitrella Genome Reveals Evolutionary Insights into the Conquest of Land by Plants , 2008, Science.

[16]  Martin S. Taylor,et al.  Genome-wide analysis of mammalian promoter architecture and evolution , 2006, Nature Genetics.

[17]  Haiyang Wang,et al.  Regulation of the Cell Expansion Gene RHD3 during Arabidopsis Development1 , 2002, Plant Physiology.

[18]  B. Pugh,et al.  Control of gene expression through regulation of the TATA-binding protein. , 2000, Gene.

[19]  Yvan Saeys,et al.  ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles , 2008, ISMB.

[20]  Kanako O. Koyanagi,et al.  Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana. , 2007, Genome research.

[21]  T. Hubbard,et al.  Computational detection and location of transcription start sites in mammalian genomic DNA. , 2002, Genome research.

[22]  Sin Lam Tan,et al.  Promoter prediction analysis on the whole human genome , 2004, Nature Biotechnology.

[23]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[24]  Yvan Saeys,et al.  Large-scale structural analysis of the core promoter in mammalian and plant genomes , 2005, Nucleic acids research.

[25]  T. Sakurai,et al.  Heterogeneity of Arabidopsis core promoters revealed by high-density TSS analysis. , 2009, The Plant journal : for cell and molecular biology.

[26]  Tobias Straub,et al.  Schizosaccharomyces pombe genome-wide nucleosome mapping reveals positioning mechanisms distinct from those of Saccharomyces cerevisiae , 2010, Nature Structural &Molecular Biology.

[27]  Joshua D. Welch,et al.  The word landscape of the non-coding segments of the Arabidopsis thaliana genome , 2009, BMC Genomics.

[28]  A B Rose,et al.  Intron-mediated regulation of gene expression. , 2008, Current topics in microbiology and immunology.

[29]  R. Meagher,et al.  Multiple conserved 5′ elements are required for high-level pollen expression of the Arabidopsis reproductive actin ACT1 , 2003, Plant Molecular Biology.

[30]  Ramana V. Davuluri,et al.  AGRIS: Arabidopsis Gene Regulatory Information Server, an information resource of Arabidopsis cis-regulatory elements and transcription factors , 2003, BMC Bioinformatics.

[31]  Gunnar Rätsch,et al.  ARTS: accurate recognition of transcription starts in human , 2006, ISMB.

[32]  R. Qu,et al.  Gene expression enhancement mediated by the 5′ UTR intron of the rice rubi3 gene varied remarkably among tissues in transgenic rice plants , 2008, Molecular Genetics and Genomics.

[33]  Kathleen Marchal,et al.  Computational Approaches to Identify Promoters and cis-Regulatory Elements in Plant Genomes1 , 2003, Plant Physiology.

[34]  G. Parra,et al.  Promoter-Proximal Introns in Arabidopsis thaliana Are Enriched in Dispersed Signals that Elevate Gene Expression[W][OA] , 2008, The Plant Cell Online.

[35]  Sai Guna Ranjan Gurazada,et al.  Genome sequencing and analysis of the model grass Brachypodium distachyon , 2010, Nature.

[36]  L. Herrera-Estrella,et al.  The first intron of the Arabidopsis thaliana gene coding for elongation factor 1 beta contains an enhancer-like element. , 1996, Gene.

[37]  I. Berezin,et al.  AtMHX is an auxin and ABA-regulated transporter whose expression pattern suggests a role in metal homeostasis in tissues with photosynthetic potential. , 2006, Functional plant biology : FPB.

[38]  J. Ohlrogge,et al.  Seed-specific expression of sesame microsomal oleic acid desaturase is controlled by combinatorial properties between negative cis-regulatory elements in the SeFAD2 promoter and enhancers in the 5′-UTR intron , 2006, Molecular Genetics and Genomics.

[39]  L. Herrera-Estrella,et al.  Characterization of Amaranthus hypochondriacus Light-Harvesting Chlorophyll a/b-Binding Polypeptide cDNAs , 1994, Plant physiology.

[40]  Hedi Peterson,et al.  g:Profiler—a web-based toolset for functional profiling of gene lists from large-scale experiments , 2007, Nucleic Acids Res..

[41]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[42]  Sang-Gu Kim,et al.  Distinct Roles of the First Introns on the Expression of Arabidopsis Profilin Gene Family Members1 , 2005, Plant Physiology.

[43]  Hang Yang In plants , expression breadth and expression level distinctly and non-linearly correlate with gene structure , 2009 .

[44]  Hong Yan,et al.  Physical signals for protein–DNA recognition , 2009, Physical biology.

[45]  Manju Bansal,et al.  High-quality annotation of promoter regions for 913 bacterial genomes , 2010, Bioinform..

[46]  Sumio Sugano,et al.  Differentiation of core promoter architecture between plants and mammals revealed by LDSS analysis , 2007, Nucleic acids research.

[47]  Yoshihiro Kawahara,et al.  The Rice Annotation Project Database (RAP-DB): 2008 update , 2007, Nucleic Acids Res..

[48]  Wei Zhao,et al.  Gramene: a bird's eye view of cereal genomes , 2005, Nucleic Acids Res..

[49]  J. SantaLucia,et al.  Thermodynamics and NMR of internal G.T mismatches in DNA. , 1997, Biochemistry.

[50]  Gautier Koscielny,et al.  Ensembl Genomes: Extending Ensembl across the taxonomic space , 2009, Nucleic Acids Res..

[51]  N. Chaubet-Gigot,et al.  Tissue-dependent enhancement of transgene expression by introns of replacement histone H3 genes of Arabidopsis , 2004, Plant Molecular Biology.

[52]  Jungwon Yoon,et al.  The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community , 2003, Nucleic Acids Res..

[53]  C. Somerville,et al.  Suspensor-derived polyembryony caused by altered expression of valyl-tRNA synthetase in the twn2 mutant of Arabidopsis. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[54]  S. Lewis,et al.  The generic genome browser: a building block for a model organism system database. , 2002, Genome research.

[55]  C. Curie,et al.  Modular organization and developmental activity of an Arabidopsis thaliana EF-1α gene promoter , 1993, Molecular and General Genetics MGG.

[56]  E. Grotewold,et al.  Genome wide analysis of Arabidopsis core promoters , 2005, BMC Genomics.

[57]  Chris M. Brown,et al.  Effect of 5'UTR introns on gene expression in Arabidopsis thaliana , 2006, BMC Genomics.

[58]  Stephen C. J. Parker,et al.  Local DNA Topography Correlates with Functional Noncoding Regions of the Human Genome , 2009, Science.

[59]  Timothy R. O'Connor,et al.  Osiris: an integrated promoter database for Oryza sativa L , 2008, Bioinform..

[60]  Yvan Saeys,et al.  Generic eukaryotic core promoter prediction using structural features of DNA. , 2008, Genome research.

[61]  Yoshihiro Ugawa,et al.  Plant cis-acting regulatory DNA elements (PLACE) database: 1999 , 1999, Nucleic Acids Res..

[62]  Takeshi Itoh,et al.  Highly Diversified Molecular Evolution of Downstream Transcription Start Sites in Rice and Arabidopsis1[W][OA] , 2008, Plant Physiology.

[63]  Masaru Tomita,et al.  GC-compositional strand bias around transcription start sites in plants and fungi , 2005, BMC Genomics.

[64]  Manju Bansal,et al.  A novel method for prokaryotic promoter prediction based on DNA stability , 2005, BMC Bioinformatics.

[65]  T. Kavanagh,et al.  Intron-regulated expression of SUVH3, an Arabidopsis Su(var)3-9 homologue. , 2006, Journal of experimental botany.

[66]  Yvan Saeys,et al.  Toward a gold standard for promoter prediction evaluation , 2009, Bioinform..

[67]  R. Myers,et al.  Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome. , 2005, Genome research.

[68]  Manju Bansal,et al.  Structural properties of promoters: similarities and differences between prokaryotes and eukaryotes , 2005, Nucleic acids research.

[69]  N. Alexandrov,et al.  Features of Arabidopsis Genes and Genome Discovered using Full-length cDNAs , 2005, Plant Molecular Biology.

[70]  C. Lamb,et al.  atpk1, a novel ribosomal protein kinase gene from Arabidopsis. I. Isolation, characterization, and expression. , 1994, The Journal of biological chemistry.

[71]  E. Geiduschek,et al.  The RNA polymerase III transcription apparatus. , 2001, Journal of molecular biology.

[72]  J. SantaLucia,et al.  A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[73]  Stephen M. Mount,et al.  The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus) , 2008, Nature.

[74]  J. Choih,et al.  Regulation of the , 1996 .

[75]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[76]  B. Mueller‐Roeber,et al.  Vacuolar membrane localization of the Arabidopsis 'two-pore' K+ channel KCO1. , 2002, The Plant journal : for cell and molecular biology.

[77]  Jun Wang,et al.  Compositional gradients in Gramineae genes. , 2002, Genome research.

[78]  Pierre Baldi,et al.  The Biology of Eukaryotic Promoter Prediction - A Review , 1999, Comput. Chem..