QuASAR: Quantitative Allele Specific Analysis of Reads

MOTIVATION Expression quantitative trait loci (eQTL) studies have discovered thousands of genetic variants that regulate gene expression, enabling a better understanding of the functional role of non-coding sequences. However, eQTL studies are costly, requiring large sample sizes and genome-wide genotyping of each sample. In contrast, analysis of allele-specific expression (ASE) is becoming a popular approach to detect the effect of genetic variation on gene expression, even within a single individual. This is typically achieved by counting the number of RNA-seq reads matching each allele at heterozygous sites and testing the null hypothesis of a 1:1 allelic ratio. In principle, when genotype information is not readily available, it could be inferred from the RNA-seq reads directly. However, there are currently no existing methods that jointly infer genotypes and conduct ASE inference, while considering uncertainty in the genotype calls. RESULTS We present QuASAR, quantitative allele-specific analysis of reads, a novel statistical learning method for jointly detecting heterozygous genotypes and inferring ASE. The proposed ASE inference step takes into consideration the uncertainty in the genotype calls, while including parameters that model base-call errors in sequencing and allelic over-dispersion. We validated our method with experimental data for which high-quality genotypes are available. Results for an additional dataset with multiple replicates at different sequencing depths demonstrate that QuASAR is a powerful tool for ASE analysis when genotypes are not available. AVAILABILITY AND IMPLEMENTATION http://github.com/piquelab/QuASAR. CONTACT fluca@wayne.edu or rpique@wayne.edu SUPPLEMENTARY INFORMATION Supplementary Material is available at Bioinformatics online.

[1]  David A. Knowles,et al.  Allelic Expression of Deleterious Protein-Coding Variants across Human Tissues , 2014, PLoS genetics.

[2]  Timothy E. Reddy,et al.  Effects of sequence variation on differential allelic transcription factor occupancy and gene expression , 2012, Genome research.

[3]  Scott T. Weiss,et al.  Global Analysis of the Impact of Environmental Perturbation on cis-Regulation of Gene Expression , 2011, PLoS genetics.

[4]  Ryan D. Morin,et al.  Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution , 2009, Nature.

[5]  John C. Marioni,et al.  Deciphering the genetic architecture of variation in the immune response to Mycobacterium tuberculosis infection , 2012, Proceedings of the National Academy of Sciences.

[6]  Wei Chen,et al.  Gene Expression in Skin and Lymphoblastoid Cells: Refined Statistical Method Reveals Extensive Overlap in Cis-eqtl Signals , 2022 .

[7]  Luigi Ferrucci,et al.  Abundant Quantitative Trait Loci Exist for DNA Methylation and Gene Expression in Human Brain , 2010, PLoS genetics.

[8]  E. Wang,et al.  Analysis and design of RNA sequencing experiments for identifying isoform regulation , 2010, Nature Methods.

[9]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[10]  Matthew Stephens,et al.  Interactions between Glucocorticoid Treatment and Cis-Regulatory Polymorphisms Contribute to Cellular Response Phenotypes , 2011, PLoS genetics.

[11]  D. Koller,et al.  Population genomics of human gene expression , 2007, Nature Genetics.

[12]  M. Gerstein,et al.  Variation in Transcription Factor Binding Among Humans , 2010, Science.

[13]  Daniel A. Skelly,et al.  A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from RNA-seq data. , 2011, Genome research.

[14]  Christian Gieger,et al.  Genetics Meets Metabolomics: A Genome-Wide Association Study of Metabolite Profiles in Human Serum , 2008, PLoS genetics.

[15]  E. Eskin,et al.  Allele-specific expression and eQTL analysis in mouse adipose tissue , 2014, BMC Genomics.

[16]  Joseph K. Pickrell,et al.  DNaseI sensitivity QTLs are a major determinant of human expression variation , 2011, Nature.

[17]  I. Măndoiu,et al.  Towards accurate detection and genotyping of expressed variants from whole transcriptome sequencing data , 2011, BMC Genomics.

[18]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[19]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[20]  N. Cox,et al.  Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS , 2010, PLoS genetics.

[21]  Swneke D. Bailey,et al.  Breast cancer risk-associated SNPs modulate the affinity of chromatin for FOXA1 and alter gene expression , 2012, Nature Genetics.

[22]  E. Dermitzakis,et al.  Candidate Causal Regulatory Effects by Integration of Expression QTLs with Complex Trait Genetic Associations , 2010, PLoS genetics.

[23]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[24]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[25]  E. Birney,et al.  Heritable Individual-Specific and Allele-Specific Chromatin Signatures in Humans , 2010, Science.

[26]  Simon C. Potter,et al.  The Architecture of Gene Regulatory Variation across Multiple Human Tissues: The MuTHER Study , 2011, PLoS genetics.

[27]  T. Pastinen Genome-wide allele-specific analysis: insights into regulatory variation , 2010, Nature Reviews Genetics.

[28]  Jin Billy Li,et al.  Reliable identification of genomic variants from RNA-seq data. , 2013, American journal of human genetics.

[29]  Manolis Kellis,et al.  Common Genetic Variants Modulate Pathogen-Sensing Responses in Human Dendritic Cells , 2014, Science.

[30]  R. Andrews,et al.  Innate Immune Activity Conditions the Effect of Regulatory Variants upon Monocyte Gene Expression , 2014, Science.

[31]  P. Deloukas,et al.  Common Regulatory Variation Impacts Gene Expression in a Cell Type–Dependent Manner , 2009, Science.

[32]  Joseph K. Pickrell,et al.  Understanding mechanisms underlying human gene expression variation with RNA sequencing , 2010, Nature.

[33]  Konrad Scheffler,et al.  Gene expression Maximum likelihood inference of imprinting and allele-specific expression from EST data , 2006 .

[34]  Jonathan K. Pritchard,et al.  Identification of Genetic Variants That Affect Histone Modifications in Human Cells , 2013, Science.

[35]  Emmanouil T. Dermitzakis,et al.  Cellular genomics for complex traits , 2012, Nature Reviews Genetics.

[36]  Vivian G. Cheung,et al.  Genetic analysis of radiation-induced changes in human gene expression , 2009, Nature.

[37]  John D. Storey A direct approach to false discovery rates , 2002 .

[38]  Wei Sun,et al.  A Statistical Framework for eQTL Mapping Using RNA‐seq Data , 2012, Biometrics.

[39]  John C. Marioni,et al.  Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data , 2009, Bioinform..

[40]  Mark I. McCarthy,et al.  A Genome-Wide Association Study Identifies Protein Quantitative Trait Loci (pQTLs) , 2008, PLoS genetics.