RECLU: a pipeline to discover reproducible transcriptional start sites and their alternative regulation using capped analysis of gene expression (CAGE)

BackgroundNext generation sequencing based technologies are being extensively used to study transcriptomes. Among these, cap analysis of gene expression (CAGE) is specialized in detecting the most 5’ ends of RNA molecules. After mapping the sequenced reads back to a reference genome CAGE data highlights the transcriptional start sites (TSSs) and their usage at a single nucleotide resolution.ResultsWe propose a pipeline to group the single nucleotide TSS into larger reproducible peaks and compare their usage across biological states. Importantly, our pipeline discovers broad peaks as well as the fine structure of individual transcriptional start sites embedded within them. We assess the performance of our approach on a large CAGE datasets including 156 primary cell types and two cell lines with biological replicas. We demonstrate that genes have complicated structures of transcription initiation events. In particular, we discover that narrow peaks embedded in broader regions of transcriptional activity can be differentially used even if the larger region is not.ConclusionsBy examining the reproducible fine scaled organization of TSS we can detect many differentially regulated peaks undetected by previous approaches.

[1]  David G. Knowles,et al.  The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression , 2012, Genome research.

[2]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[3]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[4]  Peter J. Bickel,et al.  Measuring reproducibility of high-throughput experiments , 2011, 1110.4705.

[5]  M. Hayden,et al.  Macrophage ATP-Binding Cassette Transporter A1 Overexpression Inhibits Atherosclerotic Lesion Progression in Low-Density Lipoprotein Receptor Knockout Mice , 2006, Arteriosclerosis, thrombosis, and vascular biology.

[6]  Cesare Furlanello,et al.  A promoter-level mammalian expression atlas , 2015 .

[7]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[8]  A. Krogh,et al.  A code for transcription initiation in mammalian genomes. , 2007, Genome research.

[9]  Boris Lenhard,et al.  Mammalian RNA polymerase II core promoters: insights from genome-wide studies , 2007, Nature Reviews Genetics.

[10]  Martin S. Taylor,et al.  The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line , 2009, Nature Genetics.

[11]  J. Kawai,et al.  Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  C. Elsik The pea aphid genome sequence brings theories of insect defense into question , 2010, Genome Biology.

[13]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[14]  Yoav Freund,et al.  Motif Discovery Through Predictive Modeling of Gene Regulation , 2005, RECOMB.

[15]  William Stafford Noble,et al.  Quantifying similarity between motifs , 2007, Genome Biology.

[16]  G. Mills,et al.  The emerging role of lysophosphatidic acid in cancer , 2003, Nature Reviews Cancer.

[17]  Washington Seattle An integrated encyclopedia of DNA elements in the human genome , 2016 .

[18]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[19]  E. Schröck,et al.  Comprehensive and definitive molecular cytogenetic characterization of HeLa cells by spectral karyotyping. , 1999, Cancer research.

[20]  C. Kai,et al.  CAGE: cap analysis of gene expression , 2006, Nature Methods.

[21]  Nadav S. Bar,et al.  Landscape of transcription in human cells , 2012, Nature.

[22]  Brian J. Parker,et al.  Systematic Clustering of Transcription Start Site Landscapes , 2011, PloS one.

[23]  Jun Kawai,et al.  Genome-wide detection and analysis of hippocampus core promoters using DeepCAGE. , 2009, Genome research.

[24]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[25]  Kyoko Noguchi,et al.  LPA receptors: subtypes and biological actions. , 2010, Annual review of pharmacology and toxicology.

[26]  D. Boyd,et al.  The Kruppel-like KLF4 Transcription Factor, a Novel Regulator of Urokinase Receptor Expression, Drives Synthesis of This Binding Site in Colonic Crypt Luminal Surface Epithelial Cells* , 2004, Journal of Biological Chemistry.

[27]  Keisuke Sawada,et al.  Autocrine regulation of TGF-&bgr;1-induced cell migration by exocytosis of ATP and activation of P2 receptors in human lung cancer cells , 2012, Journal of Cell Science.

[28]  Piero Carninci,et al.  Unamplified Cap Analysis of Gene Expression on a Single-molecule Sequencer , 2022 .

[29]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[30]  David J. Arenillas,et al.  JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles , 2009, Nucleic Acids Res..

[31]  D. Peeper,et al.  KLF4, p21 and context-dependent opposing forces in cancer , 2006, Nature Reviews Cancer.

[32]  Jens Keilwagen,et al.  De-Novo Discovery of Differentially Abundant Transcription Factor Binding Sites Including Their Positional Preference , 2011, PLoS Comput. Biol..

[33]  S. Gagos,et al.  Molecular insights into the heterogeneity of telomere reprogramming in induced pluripotent stem cells , 2011, Cell Research.

[34]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[35]  S. Yamanaka,et al.  Induction of Pluripotent Stem Cells from Mouse Embryonic and Adult Fibroblast Cultures by Defined Factors , 2006, Cell.

[36]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[37]  Martin S. Taylor,et al.  Genome-wide analysis of mammalian promoter architecture and evolution , 2006, Nature Genetics.

[38]  Peter C. Hollenhorst,et al.  Human RNA Polymerase III transcriptomes and relationships to Pol II promoters, enhancer-binding factors and chromatin domains , 2010, Nature Structural &Molecular Biology.