Airpart: interpretable statistical models for analyzing allelic imbalance in single-cell datasets

Motivation Allelic expression analysis aids in detection of cis-regulatory mechanisms of genetic variation which produce allelic imbalance (AI) in heterozygotes. Measuring AI in bulk data lacking time or spatial resolution has the limitation that cell-type-specific (CTS), spatial-, or time-dependent AI signals may be dampened or not detected. Results We introduce a statistical method airpart for identifying differential CTS AI from single-cell RNA-sequencing (scRNA-seq) data, or other spatially- or time-resolved datasets. airpart outputs discrete partitions of data, pointing to groups of genes and cells under common mechanisms of cis-genetic regulation. In order to account for low counts in single-cell data, our method uses a Generalized Fused Lasso with Binomial likelihood for partitioning groups of cells by AI signal, and a hierarchical Bayesian model for AI statistical inference. In simulation, airpart accurately detected partitions of cell types by their AI and had lower RMSE of allelic ratio estimates than existing methods. In real data, airpart identified differential AI patterns across cell states and could be used to define trends of AI signal over spatial or time axes. Availability The airpart package is available as an R/Bioconductor package at https://bioconductor.org/packages/airpart.

[1]  Matthew E. Ritchie,et al.  Comprehensive characterization of single-cell full-length isoforms in human and mouse with long-read sequencing , 2021, Genome Biology.

[2]  J. Marioni,et al.  CellRegMap: a statistical framework for mapping context‐specific regulatory variants using scRNA‐seq , 2021, bioRxiv.

[3]  Paul D. W. Kirk,et al.  Detection of quantitative trait loci from RNA-seq data with or without genotypes using BaseQTL , 2021, Nature Computational Science.

[4]  E. Furlong,et al.  scDALI: Modelling allelic heterogeneity of DNA accessibility in single-cells reveals context-specific genetic regulation , 2021, bioRxiv.

[5]  Cynthia A. Kalita,et al.  Functional dynamic genetic effects on gene regulation are specific to particular cell types and environmental conditions , 2021, bioRxiv.

[6]  Christina B. Azodi,et al.  Optimising expression quantitative trait locus mapping workflows for single-cell studies , 2021, bioRxiv.

[7]  Mingyao Li,et al.  Detecting cell-type-specific allelic expression imbalance by integrative analysis of bulk and single-cell RNA sequencing data , 2020, bioRxiv.

[8]  Joshua P. Zitovsky,et al.  Fast effect size shrinkage software for beta-binomial models of allelic imbalance , 2019, F1000Research.

[9]  R. Sandberg,et al.  Single-cell RNA counting at allele and isoform resolution using Smart-seq3 , 2019, Nature Biotechnology.

[10]  G. Churchill,et al.  A Bayesian mixture model for the analysis of allelic expression in single cells , 2019, Nature Communications.

[11]  Christopher M. Lee,et al.  A vast resource of allelic expression data spanning human tissues , 2019, Genome Biology.

[12]  P. Gregersen,et al.  Allele-specific expression changes dynamically during T cell activation in HLA and other autoimmune loci , 2019, bioRxiv.

[13]  R. Sandberg,et al.  Genomic encoding of transcriptional burst kinetics , 2019, Nature.

[14]  M. Goddard,et al.  Comparing allele specific expression and local expression quantitative trait loci and the influence of gene expression on complex trait variation in cattle , 2018, BMC Genomics.

[15]  Tom Reynkens,et al.  Sparse regression with Multi-type Regularized Feature modeling , 2018, Insurance: Mathematics and Economics.

[16]  Joseph G Ibrahim,et al.  Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences , 2018, bioRxiv.

[17]  P. Stankiewicz Faculty of 1000 evaluation for Mapping the mouse Allelome reveals tissue-specific regulation of allelic expression. , 2018 .

[18]  M. G. van der Wijst,et al.  Single-cell RNA sequencing identifies cell type-specific cis-eQTLs and co-expression QTLs , 2018, Nature Genetics.

[19]  Peter A. Combs,et al.  Spatially varying cis-regulatory divergence in Drosophila embryos elucidates cis-regulatory logic , 2017, bioRxiv.

[20]  Nancy R. Zhang,et al.  SCALE: modeling allele-specific gene expression by single-cell RNA sequencing , 2017, Genome Biology.

[21]  S. Antonarakis,et al.  Detection of Imprinted Genes by Single-Cell Allele-Specific Gene Expression. , 2017, American journal of human genetics.

[22]  O. Emanuelsson,et al.  GeneiASE: Detection of condition-dependent and static allele-specific expression from RNA-seq data without haplotype information , 2016, Scientific Reports.

[23]  Matthew Stephens,et al.  False discovery rates: a new deal , 2016, bioRxiv.

[24]  Stephane E. Castel,et al.  Tools and best practices for data processing in allelic expression analysis , 2015, Genome Biology.

[25]  C. Glass,et al.  The selection and function of cell type-specific enhancers , 2015, Nature Reviews Molecular Cell Biology.

[26]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.

[27]  Stefan Sperlich,et al.  Generalized Additive Models , 2014 .

[28]  R. Sandberg,et al.  Single-Cell RNA-Seq Reveals Dynamic, Random Monoallelic Gene Expression in Mammalian Cells , 2014, Science.

[29]  Åsa K. Björklund,et al.  Full-length RNA-seq from single cells using Smart-seq2 , 2014, Nature Protocols.

[30]  Robert Gentleman,et al.  Software for Computing and Annotating Genomic Ranges , 2013, PLoS Comput. Biol..

[31]  Quin F. Wills,et al.  Single-cell gene expression analysis reveals genetic associations masked in whole-tissue experiments , 2013, Nature Biotechnology.

[32]  Wei Sun,et al.  A Statistical Framework for eQTL Mapping Using RNA‐seq Data , 2012, Biometrics.

[33]  Daniel A. Skelly,et al.  A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from RNA-seq data. , 2011, Genome research.

[34]  H. Hirai,et al.  Regulation of embryonic stem cell self-renewal and pluripotency by leukaemia inhibitory factor. , 2011, The Biochemical journal.

[35]  H. Binder,et al.  A coordinate-wise optimization algorithm for the Fused Lasso , 2010, 1011.6409.

[36]  Matthew D. Young,et al.  Gene ontology analysis for RNA-seq: accounting for selection bias , 2010, Genome Biology.

[37]  B. Efron,et al.  Data Analysis Using Stein's Estimator and its Generalizations , 1975 .

[38]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .