Tools and best practices for data processing in allelic expression analysis

Allelic expression analysis has become important for integrating genome and transcriptome data to characterize various biological phenomena such as cis-regulatory variation and nonsense-mediated decay. We analyze the properties of allelic expression read count data and technical sources of error, such as low-quality or double-counted RNA-seq reads, genotyping errors, allelic mapping bias, and technical covariates due to sample preparation and sequencing, and variation in total read depth. We provide guidelines for correcting such errors, show that our quality control measures improve the detection of relevant allelic expression, and introduce tools for the high-throughput production of allelic expression data from RNA-sequencing data.

[1]  John N. Hutchinson,et al.  Widespread Monoallelic Expression on Human Autosomes , 2007, Science.

[2]  Mathieu Blanchette,et al.  Global patterns of cis variation in human cells revealed by high-density allelic expression analysis , 2009, Nature Genetics.

[3]  L. Coin,et al.  Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads , 2011, Genome Biology.

[4]  M. Gerstein,et al.  AlleleSeq: analysis of allele-specific expression and binding in a network framework , 2011, Molecular systems biology.

[5]  Serban Nacu,et al.  Fast and SNP-tolerant detection of complex variants and splicing in short reads , 2010, Bioinform..

[6]  D. Koller,et al.  Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals , 2013, Genome research.

[7]  Maria Gutierrez-Arcelus,et al.  Allelic mapping bias in RNA-sequencing is not a major confounder in eQTL studies , 2014, Genome Biology.

[8]  Kevin S. Smith,et al.  High-Resolution Transcriptome Analysis with Long-Read RNA Sequencing , 2014, PloS one.

[9]  Hua Li,et al.  dsPIG: a tool to predict imprinted genes from the deep sequencing of whole transcriptomes , 2012, BMC Bioinformatics.

[10]  Jeroen F. J. Laros,et al.  Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories , 2013, Nature Biotechnology.

[11]  Emily K. Tsang,et al.  Effect of predicted protein-truncating genetic variants on the human transcriptome , 2015, Science.

[12]  E. Dermitzakis,et al.  Tissue-Specific Effects of Genetic and Epigenetic Variation on Gene Regulation and Splicing , 2015, PLoS genetics.

[13]  Emily K. Tsang,et al.  The landscape of genomic imprinting across diverse adult human tissues , 2015, Genome research.

[14]  Joseph K. Pickrell,et al.  A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes , 2012, Science.

[15]  Jingyuan Fu,et al.  Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels , 2014, Genome Medicine.

[16]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[17]  Richard Durbin,et al.  Gene-gene and gene-environment interactions detected by transcriptome sequence analysis in twins , 2014, Nature Genetics.

[18]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[19]  Leighton J. Core,et al.  Coordinated Effects of Sequence Variation on DNA Binding, Chromatin Structure, and Transcription , 2013, Science.

[20]  E. Dermitzakis,et al.  Passive and active DNA methylation and the interplay with genetic variation in gene regulation , 2013, eLife.

[21]  David Z. Chen,et al.  Architecture of the human regulatory network derived from ENCODE data , 2012, Nature.

[22]  David A. Knowles,et al.  Allelic Expression of Deleterious Protein-Coding Variants across Human Tissues , 2014, PLoS genetics.

[23]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[24]  John C. Marioni,et al.  Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data , 2009, Bioinform..

[25]  Alessandro Romanel,et al.  ASEQ: fast allele-specific studies from next-generation sequencing data , 2015, BMC Medical Genomics.

[26]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[27]  P. Deloukas,et al.  Allelic expression mapping across cellular lineages to establish impact of non-coding SNPs , 2014, Molecular systems biology.

[28]  K. Gunderson,et al.  Genome-wide assessment of imprinted expression in human cells , 2011, Genome Biology.

[29]  Matti Pirinen,et al.  Assessing allele-specific expression across multiple tissues from RNA-seq read data , 2015, Bioinform..

[30]  J. Pritchard,et al.  WASP: allele-specific software for robust discovery of molecular quantitative trait loci , 2014, bioRxiv.

[31]  Jin Billy Li,et al.  Reliable identification of genomic variants from RNA-seq data. , 2013, American journal of human genetics.

[32]  Daniel A. Skelly,et al.  A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from RNA-seq data. , 2011, Genome research.

[33]  Daniel J. Gaffney,et al.  Fine-mapping cellular QTLs with RASQUAL and ATAC-seq , 2015, Nature Genetics.

[34]  P. Wittkopp,et al.  Sources of bias in measures of allele-specific expression derived from RNA-seq data aligned to a single reference genome , 2013, BMC Genomics.

[35]  Piero Carninci,et al.  Biased allelic expression in human primary fibroblast single cells. , 2015, American journal of human genetics.

[36]  E. Dermitzakis,et al.  Rare and Common Regulatory Variation in Population-Scale Sequenced Human Genomes , 2011, PLoS genetics.

[37]  S. Goff,et al.  Allele Workbench: Transcriptome Pipeline and Interactive Graphics for Allele-Specific Expression , 2014, PloS one.

[38]  Joseph K. Pickrell,et al.  Understanding mechanisms underlying human gene expression variation with RNA sequencing , 2010, Nature.

[39]  R. Guigó,et al.  Transcriptome genetics using second generation sequencing in a Caucasian population , 2010, Nature.

[40]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[41]  Michael Q. Zhang,et al.  Integrative analysis of haplotype-resolved epigenomes across human tissues , 2015, Nature.