RNA-Seq Data: A Complexity Journey

A paragraph from the highlights of “Transcriptomics: Throwing light on dark matter” by L. Flintoft (Nature Reviews Genetics 11, 455, 2010), says: “Reports over the past few years of extensive transcription throughout eukaryotic genomes have led to considerable excitement. However, doubts have been raised about the methods that have detected this pervasive transcription and about how much of it is functional.” Since the appearance of the ENCODE project and due to follow-up work, a shift from the pervasive transcription observed from RNA-Seq data to its functional validation is gradually occurring. However, much less attention has been turned to the problem of deciphering the complexity of transcriptome data, which determines uncertainty with regard to identification, quantification and differential expression of genes and non-coding RNAs. The aim of this mini-review is to emphasize transcriptome-related problems of direct and inverse nature for which novel inference approaches are needed.

[1]  E. Schadt,et al.  Dark matter in the genome: evidence of widespread transcription detected by microarray tiling experiments. , 2005, Trends in genetics : TIG.

[2]  David Warde-Farley,et al.  Dynamic modularity in protein interaction networks predicts breast cancer outcome , 2009, Nature Biotechnology.

[3]  T. Derrien,et al.  Long Noncoding RNAs with Enhancer-like Function in Human Cells , 2010, Cell.

[4]  J. Rinn,et al.  lincRNAs act in the circuitry controlling pluripotency and differentiation , 2011, Nature.

[5]  Wing Hung Wong,et al.  Identifiability of isoform deconvolution from junction arrays and RNA-Seq , 2009, Bioinform..

[6]  Cole Trapnell,et al.  Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. , 2011, Genes & development.

[7]  Enrico Capobianco,et al.  Manifold Learning in Protein Interactomes , 2011, J. Comput. Biol..

[8]  Enrico Capobianco,et al.  Sub-Modular Resolution Analysis by Network Mixture Models , 2010, Statistical applications in genetics and molecular biology.

[9]  Mark D. Robinson,et al.  Robustly detecting differential expression in RNA sequencing data using observation weights , 2013, Nucleic acids research.

[10]  T. Ideker,et al.  Integrative approaches for finding modular structure in biological networks , 2013, Nature Reviews Genetics.

[11]  E. Capobianco,et al.  Methods to detect transcribed pseudogenes: RNA-Seq discovery allows learning through features. , 2014, Methods in molecular biology.

[12]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[13]  G. Ammerer,et al.  Controlling gene expression in response to stress , 2011, Nature Reviews Genetics.

[14]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[15]  J. Rinn,et al.  Large non-coding RNAs: missing links in cancer? , 2010, Human molecular genetics.

[16]  A. Barabasi,et al.  Drug—target network , 2007, Nature Biotechnology.

[17]  Davis J. McCarthy,et al.  Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation , 2012, Nucleic acids research.

[18]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[19]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[20]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[21]  A. Oudenaarden,et al.  Validation of noise models for single-cell transcriptomics , 2014, Nature Methods.

[22]  J. Mattick,et al.  Long non-coding RNAs: insights into functions , 2009, Nature Reviews Genetics.

[23]  R. Guigó,et al.  Estimation of alternative splicing variability in human populations. , 2012, Genome research.

[24]  A. Jacquier The complex eukaryotic transcriptome: unexpected pervasive transcription and novel small RNAs , 2009, Nature Reviews Genetics.

[25]  Hugo Y. K. Lam,et al.  Personal Omics Profiling Reveals Dynamic Molecular and Medical Phenotypes , 2012, Cell.

[26]  Xiaobo Zhou,et al.  NSMAP: A method for spliced isoforms identification and quantification from RNA-Seq , 2011, BMC Bioinformatics.

[27]  Earl Hubbell,et al.  Resolving deconvolution ambiguity in gene alternative splicing , 2009, BMC Bioinformatics.

[28]  C. Ponting,et al.  Genomic and Transcriptional Co-Localization of Protein-Coding and Long Non-Coding RNA Pairs in the Developing Brain , 2009, PLoS genetics.

[29]  J. Rinn,et al.  Large intergenic non-coding RNA-RoR modulates reprogramming of human induced pluripotent stem cells , 2010, Nature Genetics.

[30]  J. Rinn,et al.  Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression , 2009, Proceedings of the National Academy of Sciences.

[31]  Colin N. Dewey,et al.  RNA-Seq gene expression estimation with read mapping uncertainty , 2009, Bioinform..

[32]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[33]  Marc Vidal,et al.  Interactome modeling , 2005, FEBS letters.

[34]  Tao Jiang,et al.  Inference of Isoforms from Short Sequence Reads , 2010, RECOMB.

[35]  Michael F. Lin,et al.  Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals , 2009, Nature.

[36]  Matthias E. Futschik,et al.  Inferring modules from human protein interactome classes , 2010, BMC Systems Biology.

[37]  T. Ideker,et al.  Differential network biology , 2012, Molecular systems biology.

[38]  E. Capobianco Gene feature interference deconvolution. , 2010, Mathematical biosciences.

[39]  J. Mattick The Genetic Signatures of Noncoding RNAs , 2009, PLoS genetics.

[40]  Wing Hung Wong,et al.  Statistical inferences for isoform expression in RNA-Seq , 2009, Bioinform..

[41]  E. Dougherty,et al.  Accurate and Reliable Cancer Classification Based on Probabilistic Inference of Pathway Activity , 2009, PloS one.

[42]  T. Hughes,et al.  Most “Dark Matter” Transcripts Are Associated With Known Genes , 2010, PLoS biology.

[43]  Paulo P. Amaral,et al.  The Reality of Pervasive Transcription , 2011, PLoS biology.

[44]  A. Conesa,et al.  Differential expression in RNA-seq: a matter of depth. , 2011, Genome research.

[45]  Tim R. Mercer,et al.  LONG NONCODING RNAs: INSIGHTS INTO FUNCTION , 2009 .

[46]  Hui Jiang,et al.  Statistical Modeling of RNA-Seq Data. , 2011, Statistical science : a review journal of the Institute of Mathematical Statistics.

[47]  H. Engl,et al.  Inverse problems in systems biology , 2009 .

[48]  D. Koller,et al.  A module map showing conditional activity of expression modules in cancer , 2004, Nature Genetics.

[49]  E A Leicht,et al.  Mixture models and exploratory analysis in networks , 2006, Proceedings of the National Academy of Sciences.

[50]  T. Pawson,et al.  Network medicine , 2008, FEBS letters.

[51]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[52]  Enrico Capobianco,et al.  Entropy embedding and fluctuation analysis in genomic manifolds , 2009 .

[53]  Ricardo López-Ruiz,et al.  Statistical Complexity and Fisher-Shannon Information. Applications , 2012, ArXiv.

[54]  F. Tang,et al.  Development and applications of single-cell transcriptome analysis , 2011, Nature Methods.

[55]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[56]  Enrico Capobianco,et al.  Statistical Applications in Genetics and Molecular Biology Multiscale Characterization of Signaling Network Dynamics through Features , 2012 .