Use of “default” parameter settings when analyzing single cell RNA sequencing data using Seurat: a biologist’s perspective

Aim: Analysis of large datasets has become integral to biological studies due to the advent of high throughput technologies such as next generation sequencing. Techniques for analyzing these large datasets are normally developed by bioinformaticists and statisticians, with input from biologists. Frequently, the end-user does not have the training or knowledge to make informed decisions on input parameter settings required to implement the analyses pipelines. Instead, the end-user relies on “default” settings present within the software packages, consultations with in-house bioinformaticists, or on methods described in previous publications. The aim of this study was to explore the effects of altering default parameters on the cell clustering solutions generated by a common pipeline implemented in the Seurat R package that is used to cluster cells based on single cell RNA sequencing (scRNAseq) data. Methods: We systematically assessed the effect of altering input parameters by performing iterative analyses on a single scRNAseq dataset. We compared the clustering solutions using the different input parameters to determine which parameters have a large effect on cell clustering solutions. Page 2 Schneider et al. J Transl Genet Genom 2020;4:[Online First] I http://dx.doi.org/10.20517/jtgg.2020.48 Results: We used a range of input parameters for many, but not all, of the input parameters required by the Seurat R pipeline. We found that some input parameters had a very small effect on the clustering solution, while other parameters had a much larger effect. Conclusion: We conclude that, when implementing the Seurat R package, the “default” parameters should be used with caution. We identified specific parameters that have a significant effect on clustering solutions.

[1]  Bonnie Berger,et al.  Computational Methods for Single-Cell RNA Sequencing , 2020 .

[2]  Lingling An,et al.  Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey , 2020, Frontiers in Genetics.

[3]  Thea D. Tlsty,et al.  A framework for advancing our understanding of cancer-associated fibroblasts , 2020, Nature Reviews Cancer.

[4]  Timothy K Starr,et al.  De novo prediction of cell-type complexity in single-cell RNA-seq and tumor microenvironments , 2019, Life Science Alliance.

[5]  Oliver Stegle,et al.  Benchmarking Single-Cell RNA Sequencing Protocols for Cell Atlas Projects , 2019, bioRxiv.

[6]  Atul J. Butte,et al.  Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage , 2018, Nature Immunology.

[7]  Christoph Hafemeister,et al.  Comprehensive integration of single cell data , 2018, bioRxiv.

[8]  Sandrine Dudoit,et al.  clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets , 2018, bioRxiv.

[9]  Steven J. M. Jones,et al.  The Immune Landscape of Cancer , 2018, Immunity.

[10]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[11]  Joshua W. K. Ho,et al.  CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data , 2016, Genome Biology.

[12]  Huirong Shi,et al.  Novel mechanisms and approaches to overcome multidrug resistance in the treatment of ovarian cancer. , 2016, Biochimica et biophysica acta.

[13]  Luca Scrucca,et al.  mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models , 2016, R J..

[14]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[15]  Aleksandra A. Kolodziejczyk,et al.  Accounting for technical noise in single-cell RNA-seq experiments , 2013, Nature Methods.

[16]  Ludo Waltman,et al.  A smart local moving algorithm for large-scale modularity-based community detection , 2013, The European Physical Journal B.