论文信息 - anndata: Annotated data

anndata: Annotated data

anndata is a Python package for handling annotated data matrices in memory and on disk (github.com/theislab/anndata), positioned between pandas and xarray. anndata offers a broad range of computationally efficient features including, among others, sparse data support, lazy operations, and a PyTorch interface. Statement of need Generating insight from high-dimensional data matrices typically works through training models that annotate observations and variables via low-dimensional representations. In exploratory data analysis, this involves iterative training and analysis using original and learned annotations and task-associated representations. anndata offers a canonical data structure for book-keeping these, which is neither addressed by pandas (McKinney, 2010), nor xarray (Hoyer & Hamman, 2017), nor commonly-used modeling packages like scikit-learn (Pedregosa et al., 2011).

[1] Fabian J Theis,et al. Over 1000 tools reveal trends in the single-cell RNA-seq analysis landscape , 2021, Genome Biology.

[2] O. Stegle,et al. MUON: multimodal omics analysis framework , 2021, bioRxiv.

[3] Aaron M. Streets,et al. scvi-tools: a library for deep probabilistic analysis of single-cell omics data , 2021, bioRxiv.

[4] Sidney M. Bell,et al. cellxgene: a performant, scalable exploration platform for high dimensional sparse matrices , 2021, bioRxiv.

[5] Fabian J Theis,et al. Squidpy: a scalable framework for spatial single cell analysis , 2021, bioRxiv.

[6] Raphael Gottardo,et al. Integrated analysis of multimodal single-cell data , 2020, Cell.

[7] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[8] Fabian J Theis,et al. Generalizing RNA velocity to transient cell states through dynamical modeling , 2019, Nature Biotechnology.

[9] Raphael Gottardo,et al. Orchestrating single-cell analysis with Bioconductor , 2019, Nature Methods.

[10] Shila Ghazanfar,et al. The human body at cellular resolution: the NIH Human Biomolecular Atlas Program , 2019, Nature.

[11] Leland McInnes,et al. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[12] Fabian J Theis,et al. SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[13] Benjamin Haibe-Kains,et al. Software for the integration of multi-omics experiments in Bioconductor , 2017, bioRxiv.

[14] Stephan Hoyer,et al. xarray: N-D labeled arrays and datasets in Python , 2017 .

[15] Stavros Papadopoulos,et al. The TileDB Array Data Storage Manager , 2016, Proc. VLDB Endow..

[16] Raphael Gottardo,et al. Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.

[17] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[18] Wes McKinney,et al. Data Structures for Statistical Computing in Python , 2010, SciPy.

[19] Benjamin S. Baumer,et al. Tidy data , 2022, Modern Data Science with R.