Robustness and applicability of functional genomics tools on scRNA-seq data

Many tools have been developed to extract functional and mechanistic insight from bulk transcriptome profiling data. With the advent of single-cell RNA sequencing (scRNA-seq), it is in principle possible to do such an analysis for single cells. However, scRNA-seq data has characteristics such as drop-out events, low library sizes and a comparatively large number of samples/cells. It is thus not clear if functional genomics tools established for bulk sequencing can be applied to scRNA-seq in a meaningful way. To address this question, we performed benchmark studies on in silico and in vitro single-cell RNA-seq data. We included the bulk-RNA tools PROGENy, GO enrichment and DoRothEA that estimate pathway and transcription factor (TF) activities, respectively, and compared them against the tools AUCell and metaVIPER, designed for scRNA-seq. For the in silico study we simulated single cells from TF/pathway perturbation bulk RNA-seq experiments. Our simulation strategy guarantees that the information of the original perturbation is preserved while resembling the characteristics of scRNA-seq data. We complemented the in silico data with in vitro scRNA-seq data upon CRISPR-mediated knock-out. Our benchmarks on both the simulated and real data revealed comparable performance to the original bulk data. Additionally, we showed that the TF and pathway activities preserve cell-type specific variability by analysing a mixture sample sequenced with 13 scRNA-seq different protocols. Our analyses suggest that bulk functional genomics tools can be applied to scRNA-seq data, outperforming dedicated single cell tools. Furthermore we provide a benchmark for further methods development by the community.

[1]  R. Ingraham,et al.  Assembly of major histocompatibility complex (MHC) class II transcription factors: association and promoter recognition of RFX proteins. , 2004, Biochemistry.

[2]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[3]  Bertram Klinger,et al.  Discovering causal signaling pathways through gene-expression patterns , 2010, Nucleic Acids Res..

[4]  Christian H. Holland,et al.  Transfer of regulatory knowledge from human to mouse for functional genomic analysis , 2019, bioRxiv.

[5]  Mudra Hegde,et al.  Uncoupling of sgRNAs from their associated barcodes during PCR amplification of combinatorial CRISPR screens , 2018, bioRxiv.

[6]  J. Mesirov,et al.  The Molecular Signatures Database Hallmark Gene Set Collection , 2015 .

[7]  Penghang Yin,et al.  SCRABBLE: single-cell RNA-seq imputation constrained by bulk RNA-seq data , 2019, Genome Biology.

[8]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[9]  Christian H. Holland,et al.  Benchmark and integration of resources for the estimation of human transcription factor activities. , 2019, Genome research.

[10]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[11]  Ellen T. Gelfand,et al.  A Novel Approach to High-Quality Postmortem Tissue Procurement: The GTEx Project , 2015, Biopreservation and biobanking.

[12]  J. Sáez-Rodríguez,et al.  Perturbation-response genes reveal signaling footprints in cancer gene expression , 2016, Nature Communications.

[13]  Kathleen M Jagodnik,et al.  Massive mining of publicly available RNA-seq data from human and mouse , 2017, Nature Communications.

[14]  Oliver Stegle,et al.  Benchmarking single-cell RNA-sequencing protocols for cell atlas projects , 2020, Nature Biotechnology.

[15]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[16]  Federica Toffalini,et al.  Transcription factor regulation can be accurately predicted from the presence of target gene signatures in microarray gene expression data , 2010, Nucleic acids research.

[17]  Annemarie H Meijer,et al.  Macrophage-specific gene functions in Spi1-directed innate immunity. , 2010, Blood.

[18]  Hongyu Zhao,et al.  Pathway analysis using random forests classification and regression , 2006, Bioinform..

[19]  J. Mesirov,et al.  The Molecular Signatures Database (MSigDB) hallmark gene set collection. , 2015, Cell systems.

[20]  Oliver Stegle,et al.  Benchmarking Single-Cell RNA Sequencing Protocols for Cell Atlas Projects , 2019, bioRxiv.

[21]  Davis J. McCarthy,et al.  A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor , 2016, F1000Research.

[22]  Catalin C. Barbacioru,et al.  mRNA-Seq whole-transcriptome analysis of a single cell , 2009, Nature Methods.

[23]  P. Schneider,et al.  TRAIL-R1 and TRAIL-R2 Mediate TRAIL-Dependent Apoptosis in Activated Primary Human B Lymphocytes , 2019, Front. Immunol..

[24]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[25]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[26]  Ulf Leser,et al.  Estimating genome-wide regulatory activity from multi-omics data sets using mathematical optimization , 2017, BMC Systems Biology.

[27]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[28]  Haikun Wang,et al.  Transcription factor Foxp1 exerts essential cell-intrinsic regulation of the quiescence of naive T cells , 2011, Nature Immunology.

[29]  A. Califano,et al.  Quantitative assessment of protein activity in orphan tissues and single cells using the metaVIPER algorithm , 2018, bioRxiv.

[30]  R. Maehr,et al.  Single-Cell RNA-Sequencing-Based CRISPRi Screening Resolves Molecular Drivers of Early Human Endoderm Development , 2019, Cell reports.

[31]  Thomas M. Norman,et al.  Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens , 2016, Cell.

[32]  Atul J. Butte,et al.  Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges , 2012, PLoS Comput. Biol..

[33]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Nils Blüthgen,et al.  Classification of gene signatures for their information value and functional redundancy , 2017, npj Systems Biology and Applications.

[35]  Kathleen M. Jagodnik,et al.  ChEA3: transcription factor enrichment analysis by orthogonal omics integration , 2019, Nucleic Acids Res..

[36]  Shao-Cong Sun,et al.  NF-κB signaling in inflammation , 2017, Signal Transduction and Targeted Therapy.

[37]  Julio Saez-Rodriguez,et al.  Footprint-based functional analysis of multiomic data , 2019, Current opinion in systems biology.

[38]  J. Aerts,et al.  SCENIC: Single-cell regulatory network inference and clustering , 2017, Nature Methods.

[39]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[40]  Mariano J. Alvarez,et al.  Network-based inference of protein activity helps functionalize the genetic landscape of cancer , 2016, Nature Genetics.

[41]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[42]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[43]  Zhiping Weng,et al.  Gene set enrichment analysis: performance evaluation and usage guidelines , 2012, Briefings Bioinform..

[44]  Alexey Sergushichev,et al.  An algorithm for fast preranked gene set enrichment analysis using cumulative statistic calculation , 2016 .