Enhancing reproducibility in scientific computing: Metrics and registry for Singularity containers

Here we present Singularity Hub, a framework to build and deploy Singularity containers for mobility of compute, and the singularity-python software with novel metrics for assessing reproducibility of such containers. Singularity containers make it possible for scientists and developers to package reproducible software, and Singularity Hub adds automation to this workflow by building, capturing metadata for, visualizing, and serving containers programmatically. Our novel metrics, based on custom filters of content hashes of container contents, allow for comparison of an entire container, including operating system, custom software, and metadata. First we will review Singularity Hub’s primary use cases and how the infrastructure has been designed to support modern, common workflows. Next, we conduct three analyses to demonstrate build consistency, reproducibility metric and performance and interpretability, and potential for discovery. This is the first effort to demonstrate a rigorous assessment of measurable similarity between containers and operating systems. We provide these capabilities within Singularity Hub, as well as the source software singularity-python that provides the underlying functionality. Singularity Hub is available at https://singularity-hub.org, and we are excited to provide it as an openly available platform for building, and deploying scientific containers.

[1]  Michael C. Frank,et al.  Estimating the reproducibility of psychological science , 2015, Science.

[2]  Saeed Hassanpour,et al.  Artificial Intelligence in Medicine , 2015 .

[3]  Idafen Santana-Perez,et al.  Towards Reproducibility in Scientific Workflows: An Infrastructure-Based Approach , 2015, Sci. Program..

[4]  Satrajit S. Ghosh,et al.  The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments , 2016, Scientific Data.

[5]  Vanessa Sochat,et al.  Singularity: Scientific containers for mobility of compute , 2017, PloS one.

[6]  Karthik Ram,et al.  Git can facilitate greater reproducibility and increased transparency in science , 2013, Source Code for Biology and Medicine.

[7]  Christine L. Borgman,et al.  The conundrum of sharing research data , 2012, J. Assoc. Inf. Sci. Technol..

[8]  Jeffrey Heer,et al.  SpanningAspectRatioBank Easing FunctionS ArrayIn ColorIn Date Interpolator MatrixInterpola NumObjecPointI Rectang ISchedu Parallel Pause Scheduler Sequen Transition Transitioner Transiti Tween Co DelimGraphMLCon IData JSONCon DataField DataSc Dat DataSource Data DataUtil DirtySprite LineS RectSprite , 2011 .

[9]  Satrajit S. Ghosh,et al.  BIDS apps: Improving ease of use, accessibility, and reproducibility of neuroimaging data analysis methods , 2016, bioRxiv.

[10]  Wendy W. Chapman,et al.  Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm , 2011, J. Biomed. Informatics.

[11]  Mohamed Abouelhoda,et al.  The Case for Docker in Multicloud Enabled Bioinformatics Applications , 2016, IWBBIO.

[12]  Heather A. Piwowar,et al.  Sharing Detailed Research Data Is Associated with Increased Citation Rate , 2007, PloS one.

[13]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[14]  Jeffrey Heer,et al.  D³ Data-Driven Documents , 2011, IEEE Transactions on Visualization and Computer Graphics.

[15]  Odysseas G. Koufopavlou,et al.  On the hardware implementations of the SHA-2 (256, 384, 512) hash functions , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[16]  Monya Baker,et al.  Over half of psychology studies fail reproducibility test , 2015, Nature.

[17]  C. Borgman,et al.  If We Share Data, Will Anyone Use Them? Data Sharing and Reuse in the Long Tail of Science and Technology , 2013, PloS one.

[18]  Nikolaus Kriegeskorte,et al.  Frontiers in Systems Neuroscience Systems Neuroscience , 2022 .

[19]  Ronald L. Rivest,et al.  The MD5 Message-Digest Algorithm , 1992, RFC.