Scipp: Scientific data handling with labeled multi-dimensional arrays for C++ and Python

Scipp is heavily inspired by the Python library xarray. It enriches raw NumPy-like multi-dimensional arrays of data by adding named dimensions and associated coordinates. Multiple arrays are combined into datasets. On top of this, scipp introduces (i) implicit handling of physical units, (ii) implicit propagation of uncertainties, (iii) support for histograms, i.e., bin-edge coordinate axes, which exceed the data's dimension extent by one, and (iv) support for event data. In conjunction these features enable a more natural and more concise user experience. The combination of named dimensions, coordinates, and units helps to drastically reduce the risk for programming errors. The core of scipp is written in C++ to open opportunities for performance improvements that a Python-based solution would not allow for. On top of the C++ core, scipp's Python components provide functionality for plotting and content representations, e.g., for use in Jupyter Notebooks. While none of scipp's concepts in isolation is novel per-se, we are not aware of any project combining all of these aspects in a single coherent software package.

[1]  P. F. Peterson,et al.  Event-Based Processing of Neutron Scattering Data , 2015 .

[2]  Tobias Richter,et al.  The NeXus data format , 2015, Journal of applied crystallography.

[3]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[4]  Joshaniel F. K. Cooper,et al.  The instrument suite of the European Spallation Source , 2020, Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment.

[5]  P. F. Peterson,et al.  Mantid - Data Analysis and Visualization Package for Neutron Scattering and $μ SR$ Experiments , 2014, 1407.5860.

[6]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[7]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[8]  Sandeep Koranne,et al.  Boost C++ Libraries , 2011 .

[9]  Bjarne Stroustrup,et al.  Runtime concepts for the C++ standard template library , 2008, SAC '08.

[10]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[11]  Stephan Hoyer,et al.  xarray: N-D labeled arrays and datasets in Python , 2017 .

[12]  et al.,et al.  Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.