Simultaneous characterization of sense and antisense genomic processes by the double-stranded hidden Markov model

Hidden Markov models (HMMs) have been extensively used to dissect the genome into functionally distinct regions using data such as RNA expression or DNA binding measurements. It is a challenge to disentangle processes occurring on complementary strands of the same genomic region. We present the double-stranded HMM (dsHMM), a model for the strand-specific analysis of genomic processes. We applied dsHMM to yeast using strand specific transcription data, nucleosome data, and protein binding data for a set of 11 factors associated with the regulation of transcription.The resulting annotation recovers the mRNA transcription cycle (initiation, elongation, termination) while correctly predicting strand-specificity and directionality of the transcription process. We find that pre-initiation complex formation is an essentially undirected process, giving rise to a large number of bidirectional promoters and to pervasive antisense transcription. Notably, 12% of all transcriptionally active positions showed simultaneous activity on both strands. Furthermore, dsHMM reveals that antisense transcription is specifically suppressed by Nrd1, a yeast termination factor.

[1]  Christophe Malabat,et al.  Widespread bidirectional promoters are the major source of cryptic transcripts in yeast , 2009, Nature.

[2]  Achim Tresch,et al.  Annotation of genomics data using bidirectional hidden Markov models unveils variations in Pol II transcription cycle , 2014, Molecular systems biology.

[3]  Timothy J. Durham,et al.  "Systematic" , 1966, Comput. J..

[4]  Joana Sequeira-Mendes,et al.  The Functional Topography of the Arabidopsis Genome Is Organized in a Reduced Number of Linear Motifs of Chromatin States[C][W] , 2014, Plant Cell.

[5]  Manolis Kellis,et al.  ChromHMM: automating chromatin-state discovery and characterization , 2012, Nature Methods.

[6]  William Stafford Noble,et al.  Integrative annotation of chromatin elements from ENCODE data , 2012, Nucleic acids research.

[7]  Moritz Herrmann,et al.  Comparative analysis of metazoan chromatin organization , 2014, Nature.

[8]  Guillaume J. Filion,et al.  Systematic Protein Location Mapping Reveals Five Principal Chromatin Types in Drosophila Cells , 2010, Cell.

[9]  William Stafford Noble,et al.  Unsupervised segmentation of continuous genomic data , 2007, Bioinform..

[10]  Johannes Söding,et al.  Uniform transitions of the general RNA polymerase II transcription complex , 2010, Nature Structural &Molecular Biology.

[11]  William Stafford Noble,et al.  Comparative analysis of metazoan chromatin , 2014 .

[12]  Patrick Cramer,et al.  CTD Tyrosine Phosphorylation Impairs Termination Factor Recruitment to RNA Polymerase II , 2012, Science.

[13]  Peter J. Park,et al.  hiHMM: Bayesian non-parametric joint inference of chromatin state maps , 2015, Bioinform..

[14]  Ronald W. Davis,et al.  A high-resolution atlas of nucleosome occupancy in yeast , 2007, Nature Genetics.

[15]  Daniel Schulz,et al.  Transcriptome Surveillance by Selective Termination of Noncoding RNA Synthesis , 2013, Cell.

[16]  K. Nishikura,et al.  Extensive adenosine‐to‐inosine editing detected in Alu repeats of antisense RNAs reveals scarcity of sense–antisense duplex formation , 2006, FEBS letters.

[17]  Manolis Kellis,et al.  Discovery and characterization of chromatin states for systematic annotation of the human genome , 2010, Nature Biotechnology.

[18]  William Stafford Noble,et al.  Unsupervised pattern discovery in human chromatin structure through genomic segmentation , 2012, Nature Methods.

[19]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[20]  Benedikt Zacher,et al.  Analysis of Affymetrix ChIP-chip data using starr and R/Bioconductor. , 2011, Cold Spring Harbor protocols.

[21]  L. Steinmetz,et al.  Bidirectional promoters generate pervasive transcription in yeast , 2009, Nature.

[22]  Zoubin Ghahramani,et al.  A reversible infinite HMM using normalised random measures , 2014, ICML.

[23]  Achim Tresch,et al.  Starr: Simple Tiling ARRay analysis of Affymetrix ChIP-chip data , 2009, BMC Bioinformatics.

[24]  William Stafford Noble,et al.  Identification of higher-order functional domains in the human ENCODE regions. , 2007, Genome research.

[25]  Leighton J. Core,et al.  Nascent RNA Sequencing Reveals Widespread Pausing and Divergent Initiation at Human Promoters , 2008, Science.

[26]  Gos Micklem,et al.  Supporting Online Material Materials and Methods Figs. S1 to S50 Tables S1 to S18 References Identification of Functional Elements and Regulatory Circuits by Drosophila Modencode , 2022 .