Fast Computational Recovery of Missing Features for Large-scale Biological Data

The lack of feature information is common in biological data and can seriously degrade the performance of existing data analysis methods. This chapter focuses on missing gene features in single-cell transcriptomics data. In the rapidly development of single-cell sequencing, the latest technological advances have made it possible to measure the intrinsic activity of single cells on a large scale, and enable to analyze the composition of cells within tissues with high precision. Based on this technology, many important biological structure identification methods have been proposed for the analysis of gene data. However, the missing genetic features have seriously hindered the full exploration of the internal information of biological data. For most of existing datasets, only about 20% of the genetic profiles can be effectively measured. Facing this problem, this chapter proposes deep recurrent autoencoder learning to achieve accurate and rapid imputation of missing gene expressions from millions of cell expression data.

[1]  W. Koh,et al.  Single-cell genome sequencing: current state of the science , 2016, Nature Reviews Genetics.

[2]  A. Regev,et al.  Efficient Generation of Transcriptomic Profiles by Random Composite Measurements , 2017, Cell.

[3]  S. Orkin,et al.  Mapping the Mouse Cell Atlas by Microwell-Seq , 2018, Cell.

[4]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[5]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[6]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[7]  Kevin R. Moon,et al.  Recovering Gene Interactions from Single-Cell Data Using Data Diffusion , 2018, Cell.

[8]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, Genome Biology.

[9]  Bo Wang,et al.  Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning , 2016, Nature Methods.

[10]  S. Dudoit,et al.  A general and flexible method for signal extraction from single-cell RNA-seq data , 2018, Nature Communications.

[11]  Fabian J Theis,et al.  Single-cell RNA-seq denoising using a deep count autoencoder , 2019, Nature Communications.

[12]  Rona S. Gertner,et al.  Single cell RNA Seq reveals dynamic paracrine control of cellular variation , 2014, Nature.

[13]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[14]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[15]  Sean C. Bendall,et al.  Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis , 2015, Cell.

[16]  A. Saliba,et al.  Single-cell RNA-seq: advances and future challenges , 2014, Nucleic acids research.