APRICOT: Advanced Platform for Reproducible Infrastructures in the Cloud via Open Tools

Background  Scientific publications are meant to exchange knowledge among researchers but the inability to properly reproduce computational experiments limits the quality of scientific research. Furthermore, bibliography shows that irreproducible preclinical research exceeds 50%, which produces a huge waste of resources on nonprofitable research at Life Sciences field. As a consequence, scientific reproducibility is being fostered to promote Open Science through open databases and software tools that are typically deployed on existing computational resources. However, some computational experiments require complex virtual infrastructures, such as elastic clusters of PCs, that can be dynamically provided from multiple clouds. Obtaining these infrastructures requires not only an infrastructure provider, but also advanced knowledge in the cloud computing field. Objectives  The main aim of this paper is to improve reproducibility in life sciences to produce better and more cost-effective research. For that purpose, our intention is to simplify the infrastructure usage and deployment for researchers. Methods  This paper introduces Advanced Platform for Reproducible Infrastructures in the Cloud via Open Tools (APRICOT), an open source extension for Jupyter to deploy deterministic virtual infrastructures across multiclouds for reproducible scientific computational experiments. To exemplify its utilization and how APRICOT can improve the reproduction of experiments with complex computation requirements, two examples in the field of life sciences are provided. All requirements to reproduce both experiments are disclosed within APRICOT and, therefore, can be reproduced by the users. Results  To show the capabilities of APRICOT, we have processed a real magnetic resonance image to accurately characterize a prostate cancer using a Message Passing Interface cluster deployed automatically with APRICOT. In addition, the second example shows how APRICOT scales the deployed infrastructure, according to the workload, using a batch cluster. This example consists of a multiparametric study of a positron emission tomography image reconstruction. Conclusion  APRICOT's benefits are the integration of specific infrastructure deployment, the management and usage for Open Science, making experiments that involve specific computational infrastructures reproducible. All the experiment steps and details can be documented at the same Jupyter notebook which includes infrastructure specifications, data storage, experimentation execution, results gathering, and infrastructure termination. Thus, distributing the experimentation notebook and needed data should be enough to reproduce the experiment.

[1]  Emilie Niaf,et al.  Influence of imaging and histological factors on prostate cancer detection and localisation on multiparametric MRI: a prospective study , 2013, European Radiology.

[2]  Xuan Liu,et al.  Comparison of 3-D reconstruction with 3D-OSEM and with FORE+OSEM for PET , 2001, IEEE Transactions on Medical Imaging.

[3]  R M Weisskoff,et al.  Water diffusion and exchange as they influence contrast enhancement , 1997, Journal of magnetic resonance imaging : JMRI.

[4]  Maria Lyra,et al.  Filtering in SPECT Image Reconstruction , 2011, Int. J. Biomed. Imaging.

[5]  P. Tofts,et al.  Measurement of the blood‐brain barrier permeability and leakage space using dynamic MR imaging. 1. Fundamental concepts , 1991, Magnetic resonance in medicine.

[6]  Fons Rademakers,et al.  ROOT — An object oriented data analysis framework , 1997 .

[7]  Eloy Romero,et al.  Self-managed cost-efficient virtual elastic clusters on hybrid Cloud infrastructures , 2016, Future Gener. Comput. Syst..

[8]  David Y. Lu,et al.  Multifocality and prostate cancer detection by multiparametric magnetic resonance imaging: correlation with whole-mount histopathology. , 2015, European urology.

[9]  et al.,et al.  Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.

[10]  F. Ballester,et al.  Collision-kerma conversion between dose-to-tissue and dose-to-water by photon energy-fluence corrections in low-energy brachytherapy , 2017, Physics in medicine and biology.

[11]  Vicente Vidal,et al.  CT image reconstruction with SuiteSparseQR factorization package , 2020 .

[12]  J. Olsen,et al.  The European Commission , 2020, The European Union.

[13]  James G. Ravenel,et al.  Pulmonary nodule volume: effects of reconstruction parameters on automated measurements--a phantom study. , 2008, Radiology.

[14]  P. Kellokumpu-Lehtinen,et al.  Dynamic Contrast-Enhanced Imaging as a Prognostic Tool in Early Diagnosis of Prostate Cancer: Correlation with PSA and Clinical Stage , 2018, Contrast media & molecular imaging.

[15]  Daniel Lesnic,et al.  Fitting the two‐compartment model in DCE‐MRI by linear inversion , 2016, Magnetic resonance in medicine.

[16]  O Henriksen,et al.  Quantitation of blood‐brain barrier defect by magnetic resonance imaging and gadolinium‐DTPA in patients with multiple sclerosis and brain tumors , 1990, Magnetic resonance in medicine.

[17]  Bo Zhao,et al.  Image artifacts in digital breast tomosynthesis: investigation of the effects of system geometry and reconstruction parameters using a linear system approach. , 2008, Medical physics.

[18]  Matthew R. Cooperberg,et al.  Epidemiology of prostate cancer , 2017, World Journal of Urology.

[19]  Andrew J. Reader,et al.  One-pass list-mode EM algorithm for high-resolution 3-D PET image reconstruction into large arrays , 2002 .

[20]  J. Goo,et al.  Volumetric measurement of synthetic lung nodules with multi-detector row CT: effect of various image reconstruction parameters and segmentation thresholds on measurement accuracy. , 2005, Radiology.

[21]  L R Schad,et al.  Pharmacokinetic parameters in CNS Gd-DTPA enhanced MR imaging. , 1991, Journal of computer assisted tomography.

[22]  G. Barker,et al.  The MRI measurement of NMR and physiological parameters in tissue to study disease process. , 1991, Progress in clinical and biological research.

[23]  Arian Maleki,et al.  Reproducible Research in Computational Harmonic Analysis , 2009, Computing in Science & Engineering.

[24]  Jiang Hsieh,et al.  Abdominal CT: comparison of adaptive statistical iterative and filtered back projection reconstruction techniques. , 2010, Radiology.

[25]  J. Sempau,et al.  PENELOPE-2006: A Code System for Monte Carlo Simulation of Electron and Photon Transport , 2009 .

[26]  L. Shepp,et al.  Maximum Likelihood Reconstruction for Emission Tomography , 1983, IEEE Transactions on Medical Imaging.

[27]  Ignacio Blanquer,et al.  Multi-elastic Datacenters: Auto-scaled Virtual Clusters on Energy-Aware Physical Infrastructures , 2018, Journal of Grid Computing.

[28]  Carole A. Goble,et al.  SEEK: a systems biology data and model management platform , 2015, BMC Systems Biology.

[29]  Daniel Nüst,et al.  Opening the Publication Process with Executable Research Compendia , 2017, D Lib Mag..

[30]  Ignacio Blanquer,et al.  Dynamic Management of Virtual Infrastructures , 2015, Journal of Grid Computing.

[31]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[32]  S. Kety The theory and applications of the exchange of inert gas at the lungs and tissues. , 1951, Pharmacological reviews.

[33]  I. Cockburn,et al.  The Economics of Reproducibility in Preclinical Research , 2015, PLoS biology.

[34]  M. Baker 1,500 scientists lift the lid on reproducibility , 2016, Nature.