Differential analysis of gene regulation at transcript resolution with RNA-seq

Differential analysis of gene and transcript expression using high-throughput RNA sequencing (RNA-seq) is complicated by several sources of measurement variability and poses numerous statistical challenges. We present Cuffdiff 2, an algorithm that estimates expression at transcript-level resolution and controls for variability evident across replicate libraries. Cuffdiff 2 robustly identifies differentially expressed transcripts and genes and reveals differential splicing and promoter-preference changes. We demonstrate the accuracy of our approach through differential analysis of lung fibroblasts in response to loss of the developmental transcription factor HOXA1, which we show is required for lung fibroblast and HeLa cell cycle progression. Loss of HOXA1 results in significant expression level changes in thousands of individual transcripts, along with isoform switching events in key regulators of the cell cycle. Cuffdiff 2 performs robust differential analysis in RNA-seq experiments at transcript resolution, revealing a layer of regulation not readily observable with other high-throughput technologies.

[1]  D O Morgan,et al.  Cyclin-dependent kinases: engines, clocks, and microprocessors. , 1997, Annual review of cell and developmental biology.

[2]  Anindya Dutta,et al.  Identification and Characterization of the Human ORC6 Homolog* , 2000, The Journal of Biological Chemistry.

[3]  F. Conlon,et al.  The T-box family , 2002, Genome Biology.

[4]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Joseph C. Pearson,et al.  Modulating Hox gene functions during animal body patterning , 2005, Nature Reviews Genetics.

[6]  M. Daly,et al.  Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). , 2005, Genome research.

[7]  Howard Y. Chang,et al.  Anatomic Demarcation by Positional Variation in Fibroblast Gene Expression Programs , 2006, PLoS genetics.

[8]  T. Mikkelsen,et al.  Genome-wide maps of chromatin state in pluripotent and lineage-committed cells , 2007, Nature.

[9]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[10]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[11]  Tyson A. Clark,et al.  HITS-CLIP yields genome-wide insights into brain alternative RNA processing , 2008, Nature.

[12]  S. Ranade,et al.  Stem cell transcriptome profiling via massive-scale mRNA sequencing , 2008, Nature Methods.

[13]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.

[14]  B. Frey,et al.  Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing , 2008, Nature Genetics.

[15]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[16]  P. Khaitovich,et al.  BMC Genomics BioMed Central Methodology article Estimating accuracy of RNA-Seq and microarrays with proteomics , 2022 .

[17]  P. Giresi,et al.  Isolation of active regulatory elements from eukaryotic chromatin using FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements). , 2009, Methods.

[18]  E. Liu,et al.  An Oestrogen Receptor α-bound Human Chromatin Interactome , 2009, Nature.

[19]  Wing Hung Wong,et al.  Statistical inferences for isoform expression in RNA-Seq , 2009, Bioinform..

[20]  Wing Hung Wong,et al.  Identifiability of isoform deconvolution from junction arrays and RNA-Seq , 2009, Bioinform..

[21]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[22]  R. Guigó,et al.  Transcriptome genetics using second generation sequencing in a Caucasian population , 2010, Nature.

[23]  Thomas J. Hardcastle,et al.  baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data , 2010, BMC Bioinformatics.

[24]  E. Wang,et al.  Analysis and design of RNA sequencing experiments for identifying isoform regulation , 2010, Nature Methods.

[25]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature Biotechnology.

[26]  B. Langmead,et al.  Cloud-scale RNA-sequencing differential expression analysis with Myrna , 2010, Genome Biology.

[27]  Steven J. M. Jones,et al.  Alternative expression analysis by RNA sequencing , 2010, Nature Methods.

[28]  Matthew D. Young,et al.  From RNA-seq reads to differential expression results , 2010, Genome Biology.

[29]  Ion I. Mandoiu,et al.  Estimation of Alternative Splicing isoform Frequencies from RNA-Seq Data , 2010, WABI.

[30]  Joseph K. Pickrell,et al.  Understanding mechanisms underlying human gene expression variation with RNA sequencing , 2010, Nature.

[31]  W. Huber,et al.  Differential expression analysis for sequence count data , 2010 .

[32]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature biotechnology.

[33]  M. Gerstein,et al.  Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short, long, and paired-end sequencing , 2010, Proceedings of the National Academy of Sciences.

[34]  Xuegong Zhang,et al.  DEGseq: an R package for identifying differentially expressed genes from RNA-seq data , 2010, Bioinform..

[35]  Cole Trapnell,et al.  Improving RNA-Seq expression estimates by correcting for fragment bias , 2011, Genome Biology.

[36]  Toshiro K. Ohsumi,et al.  Genome-wide identification of polycomb-associated RNAs by RIP-seq. , 2010, Molecular cell.

[37]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[38]  Xuegong Zhang,et al.  Isoform abundance inference provides a more accurate estimation of gene expression levels in RNA-seq. , 2010, Journal of bioinformatics and computational biology.

[39]  R. Stewart,et al.  Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells , 2011, Nature.

[40]  A. Conesa,et al.  Differential expression in RNA-seq: a matter of depth. , 2011, Genome research.

[41]  David G. Pisano,et al.  Cdc14b regulates mammalian RNA polymerase II and represses cell cycle transcription , 2011, Scientific reports.

[42]  Yingfang Liu,et al.  Structural analysis of human Orc6 protein reveals a homology with transcription factor TFIIB , 2011, Proceedings of the National Academy of Sciences.

[43]  Cole Trapnell,et al.  Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. , 2011, Genes & development.

[44]  Sanghyuk Lee,et al.  Accurate quantification of transcriptome from RNA-Seq data by effective length normalization , 2010, Nucleic Acids Res..

[45]  Peter J. Bickel,et al.  The Developmental Transcriptome of Drosophila melanogaster , 2010, Nature.

[46]  Christopher B. Burge,et al.  Alternative Splicing of RNA Triplets Is Often Regulated and Accelerates Proteome Evolution , 2012, PLoS biology.

[47]  Nadav S. Bar,et al.  Landscape of transcription in human cells , 2012, Nature.

[48]  Wolfgang Huber,et al.  Detecting differential usage of exons from RNA-Seq data , 2012 .

[49]  Eric T. Wang,et al.  Transcriptome-wide Regulation of Pre-mRNA Splicing and mRNA Localization by Muscleblind Proteins , 2012, Cell.

[50]  V. Papaioannou,et al.  Diverse functional networks of Tbx3 in development and disease , 2012, Wiley interdisciplinary reviews. Systems biology and medicine.

[51]  Antti Honkela,et al.  Identifying differentially expressed transcripts from RNA-seq data with biological variation , 2011, Bioinform..