PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data

Abstract Single-cell RNA sequencing is an increasingly used method to measure gene expression at the single cell level and build cell-type atlases of tissues. Hundreds of single-cell sequencing datasets have already been published. However, studies are frequently deposited as raw data, a format difficult to access for biological researchers due to the need for data processing using complex computational pipelines. We have implemented an online database, PanglaoDB, accessible through a user-friendly interface that can be used to explore published mouse and human single cell RNA sequencing studies. PanglaoDB contains pre-processed and pre-computed analyses from more than 1054 single-cell experiments covering most major single cell platforms and protocols, based on more than 4 million cells from a wide range of tissues and organs. The online interface allows users to query and explore cell types, genetic pathways and regulatory networks. In addition, we have established a community-curated cell-type marker compendium, containing more than 6000 gene-cell-type associations, as a resource for automatic annotation of cell types.

[1]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[2]  Sorin Draghici,et al.  Down-weighting overlapping genes improves gene set analysis , 2012, BMC Bioinformatics.

[3]  William R Sellers,et al.  Maintenance of adenomatous polyposis coli (APC)-mutant colorectal cancer is dependent on Wnt/β-catenin signaling , 2011, Proceedings of the National Academy of Sciences.

[4]  J. Mesirov,et al.  The Molecular Signatures Database Hallmark Gene Set Collection , 2015 .

[5]  Aleksandra A. Kolodziejczyk,et al.  The technology and biology of single-cell RNA sequencing. , 2015, Molecular cell.

[6]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[7]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[8]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[9]  Debra L. Fulton,et al.  TFCat: the curated catalog of mouse and human transcription factors , 2009, Genome Biology.

[10]  Edwin Cuppen,et al.  Sambamba: fast processing of NGS alignment formats , 2015, Bioinform..

[11]  Zhongming Zhao,et al.  scRNASeqDB: A Database for RNA-Seq Based Gene Expression Profiles in Human Single Cells , 2017, Genes.

[12]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[13]  J. Mesirov,et al.  The Molecular Signatures Database (MSigDB) hallmark gene set collection. , 2015, Cell systems.

[14]  Castrense Savojardo,et al.  eDGAR: a database of Disease-Gene Associations with annotated Relationships among genes , 2017, BMC Genomics.

[15]  Yves Moreau,et al.  GRNBoost2 and Arboreto: efficient and scalable inference of gene regulatory networks , 2018, Bioinform..

[16]  Rhonda Bacher,et al.  Design and computational analysis of single-cell RNA-sequencing experiments , 2016, Genome Biology.

[17]  J. Aerts,et al.  SCENIC: Single-cell regulatory network inference and clustering , 2017, Nature Methods.

[18]  Mikhail Pachkov,et al.  SwissRegulon, a database of genome-wide annotations of regulatory sites: recent updates , 2012, Nucleic Acids Res..

[19]  Stein Aerts,et al.  iRegulon: From a Gene List to a Gene Regulatory Network Using Large Motif and Track Collections , 2014, PLoS Comput. Biol..

[20]  Steven L Salzberg,et al.  HISAT: a fast spliced aligner with low memory requirements , 2015, Nature Methods.

[21]  Gioele La Manno,et al.  Quantitative single-cell RNA-seq with unique molecular identifiers , 2013, Nature Methods.

[22]  Hiroshi Tanaka,et al.  FANTOM5 CAGE profiles of human and mouse samples , 2017, Scientific Data.

[23]  William Stafford Noble,et al.  FIMO: scanning for occurrences of a given motif , 2011, Bioinform..

[24]  W. Shi,et al.  The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote , 2013, Nucleic acids research.

[25]  Sachi Kato,et al.  SCPortalen: human and mouse single-cell centric database , 2017, Nucleic Acids Res..

[26]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[27]  Kate B. Cook,et al.  Determination and Inference of Eukaryotic Transcription Factor Sequence Specificity , 2014, Cell.

[28]  Feng Li,et al.  CellMarker: a manually curated resource of cell markers in human and mouse , 2018, Nucleic Acids Res..

[29]  N. Jayaram,et al.  Evaluating tools for transcription factor binding site prediction , 2016, BMC Bioinformatics.

[30]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[31]  Vladimir B. Bajic,et al.  HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models , 2015, Nucleic Acids Res..

[32]  Åsa K. Björklund,et al.  Full-length RNA-seq from single cells using Smart-seq2 , 2014, Nature Protocols.

[33]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[34]  A. Visel,et al.  Homotypic clusters of transcription factor binding sites are a key component of human promoters and enhancers. , 2010, Genome research.

[35]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[36]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[37]  Davis J. McCarthy,et al.  A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor , 2016, F1000Research.

[38]  J. Ernst,et al.  Cooperative Binding of Transcription Factors Orchestrates Reprogramming , 2017, Cell.

[39]  David J. Arenillas,et al.  JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework , 2017, Nucleic acids research.

[40]  A. Heger,et al.  UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy , 2016, bioRxiv.

[41]  Martha L. Bulyk,et al.  UniPROBE: an online database of protein binding microarray data on protein–DNA interactions , 2008, Nucleic Acids Res..

[42]  Hideaki Sugawara,et al.  The Sequence Read Archive , 2010, Nucleic Acids Res..