Improved integration of single-cell transcriptome and surface protein expression by LinQ-View

Summary Multimodal advances in single-cell sequencing have enabled the simultaneous quantification of cell surface protein expression alongside unbiased transcriptional profiling. Here, we present LinQ-View, a toolkit designed for multimodal single-cell data visualization and analysis. LinQ-View integrates transcriptional and cell surface protein expression profiling data to reveal more accurate cell heterogeneity and proposes a quantitative metric for cluster purity assessment. Through comparison with existing multimodal methods on multiple public CITE-seq datasets, we demonstrate that LinQ-View efficiently generates accurate cell clusters, especially in CITE-seq data with routine numbers of surface protein features, by preventing variations in a single surface protein feature from affecting results. Finally, we utilized this method to integrate single-cell transcriptional and protein expression data from SARS-CoV-2-infected patients, revealing antigen-specific B cell subsets after infection. Our results suggest LinQ-View could be helpful for multimodal analysis and purity assessment of CITE-seq datasets that target specific cell populations (e.g., B cells).

[1]  D. Choudhuri,et al.  Exceptional increase in the creep life of magnesium rare-earth alloys due to localized bond stiffening , 2017, Nature Communications.

[2]  Neville E. Sanjana,et al.  Multiplexed detection of proteins, transcriptomes, clonotypes and CRISPR perturbations in single cells , 2019, Nature Methods.

[3]  Andrew C. Adey,et al.  Single-Cell Transcriptional Profiling of a Multicellular Organism , 2017 .

[4]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[5]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[6]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[7]  R. Sandberg,et al.  Full-Length mRNA-Seq from single cell levels of RNA and individual circulating tumor cells , 2012, Nature Biotechnology.

[8]  Laleh Haghverdi,et al.  Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors , 2018, Nature Biotechnology.

[9]  Paul J. Hoffman,et al.  Comprehensive Integration of Single-Cell Data , 2018, Cell.

[10]  J. Marioni,et al.  MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data , 2020, Genome Biology.

[11]  Deborah F. Swayne,et al.  Data Visualization With Multidimensional Scaling , 2008 .

[12]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[13]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[14]  Åsa K. Björklund,et al.  Smart-seq2 for sensitive full-length transcriptome profiling in single cells , 2013, Nature Methods.

[15]  M. Nussenzweig,et al.  Polyreactive antibodies in adaptive immune responses to viruses , 2011, Cellular and Molecular Life Sciences.

[16]  Karlynn E. Neu,et al.  Polyreactive Broadly Neutralizing B cells Are Selected to Provide Defense against Pandemic Threat Influenza Viruses , 2020, Immunity.

[17]  Chenwei Li,et al.  An entropy-based metric for assessing the purity of single cell populations , 2020, Nature Communications.

[18]  Henry A. Utset,et al.  Profiling B cell immunodominance after SARS-CoV-2 infection reveals antibody evolution to non-neutralizing viral targets , 2021, Immunity.

[19]  Sijian Wang,et al.  SPARSE INTEGRATIVE CLUSTERING OF MULTIPLE OMICS DATA SETS. , 2013, The annals of applied statistics.

[20]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[21]  R. Satija,et al.  Integrative single-cell analysis , 2019, Nature Reviews Genetics.

[22]  T. Hashimshony,et al.  CEL-Seq: single-cell RNA-Seq by multiplexed linear amplification. , 2012, Cell reports.

[23]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[24]  Adam B. Olshen,et al.  Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis , 2009, Bioinform..

[25]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[26]  Xianwen Ren,et al.  Direct Comparative Analyses of 10X Genomics Chromium and Smart-seq2 , 2019, bioRxiv.

[27]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[28]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[29]  Hani Jieun Kim,et al.  CiteFuse enables multi-modal analysis of CITE-seq data. , 2020, Bioinformatics.

[30]  Vanessa M. Peterson,et al.  Multiplexed quantification of proteins and transcripts in single cells , 2017, Nature Biotechnology.

[31]  Richard A. Muscat,et al.  Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding , 2018, Science.

[32]  Catalin C. Barbacioru,et al.  mRNA-Seq whole-transcriptome analysis of a single cell , 2009, Nature Methods.

[33]  J. Marioni,et al.  Multi‐Omics Factor Analysis—a framework for unsupervised integration of multi‐omics data sets , 2018, Molecular systems biology.