Lognormality and oscillations in the coverage of high-throughput transcriptomic data towards gene ends

High-throughput transcriptomics experiments have reached the stage where the count of the number of reads alignable to a given position can be treated as an almost-continuous signal. This allows us to ask questions of biophysical/biotechnical nature, but which may still have biological implications. Here we show that when sequencing RNA fragments from one end, as is the case on most platforms, an oscillation in the read count is observed at the other end. We further show that these oscillations can be well described by Kolmogorov's 1941 broken stick model. We investigate how the model can be used to improve predictions of gene ends (3' transcript ends), but conclude that with present data the improvement is only marginal. The results highlight subtle effects in high-throughput transcriptomics experiments which do not have a biological origin, but which may still be used to obtain biological information.

[1]  Jana Marie Schwarz,et al.  Systematic Comparison of Three Methods for Fragmentation of Long-Range PCR Products for Next Generation Sequencing , 2011, PloS one.

[2]  E. Brody,et al.  Prediction of rho-independent Escherichia coli transcription terminators. A statistical analysis of their RNA stem-loop structures. , 1990 .

[3]  Mark Ptashne,et al.  A Genetic Switch, Phage Lambda Revisited , 2004 .

[4]  L. Bossi,et al.  Terminator still moving forward: expanding roles for Rho factor. , 2013, Current opinion in microbiology.

[5]  U. Frisch Turbulence: The Legacy of A. N. Kolmogorov , 1996 .

[6]  S. Salzberg,et al.  Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake , 2007, Genome Biology.

[7]  Robert Landick,et al.  Bacterial transcription terminators: the RNA 3'-end chronicles. , 2011, Journal of molecular biology.

[8]  Erik Aurell,et al.  A simple and efficient method to search for selected primary transcripts: non-coding and antisense RNAs in the human pathogen Enterococcus faecalis , 2011, Nucleic acids research.

[9]  P. V. von Hippel,et al.  Transcription termination at intrinsic terminators: the role of the RNA hairpin. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[10]  George Sugihara,et al.  Moments of particle size distributions under sequential breakage with applications to species abundance , 1983, Journal of Applied Probability.

[11]  P. Murray,et al.  In vitro susceptibility studies of vancomycin-resistant Enterococcus faecalis , 1989, Antimicrobial Agents and Chemotherapy.

[12]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[13]  S. Salzberg,et al.  Improved microbial gene identification with GLIMMER. , 1999, Nucleic acids research.

[14]  Kelli L. Palmer,et al.  Multidrug-Resistant Enterococci Lack CRISPR-cas , 2010, mBio.

[15]  David Holcman,et al.  Time scale of diffusion in molecular and cellular biology , 2014 .

[16]  Sara B. Linker,et al.  Comparison of Three Targeted Enrichment Strategies on the SOLiD Sequencing Platform , 2011, PloS one.

[17]  Thomas Hartsch,et al.  Genome-Wide Identification of Small RNAs in the Opportunistic Pathogen Enterococcus faecalis V583 , 2011, PloS one.

[18]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.