Notebook-as-a-VRE (NaaVRE): from private notebooks to a collaborative cloud virtual research environment

Virtual Research Environments (VREs) provide user-centric support in the lifecycle of research activities, e.g., discovering and accessing research assets, or composing and executing application workflows. A typical VRE is often implemented as an integrated environment, which includes a catalog of research assets, a workflow management system, a data management framework, and tools for enabling collaboration among users. Notebook environments, such as Jupyter, allow researchers to rapidly prototype scientific code and share their experiments as online accessible notebooks. Jupyter can support several popular languages that are used by data scientists, such as Python, R, and Julia. However, such notebook environments do not have seamless support for running heavy computations on remote infrastructure or finding and accessing software code inside notebooks. This paper investigates the gap between a notebook environment and a VRE and proposes an embedded VRE solution for the Jupyter environment called Notebook-as-aVRE (NaaVRE). The NaaVRE solution provides functional components via a component marketplace and allows users to create a customized VRE on top of the Jupyter environment. From the VRE, a user can search research assets (data, software, and algorithms), compose workflows, manage the lifecycle of an experiment, and share the results among users in the community. We demonstrate how such a solution can enhance a legacy workflow that uses Light Detection and Ranging (LiDAR) data from country-wide airborne laser scanning surveys for deriving geospatial data products of ecosystem structure at high resolution over broad spatial extents. This enables users to scale out the processing of multi-terabyte LiDAR point clouds for ecological applications to more data sources in a distributed cloud environment. Keywords—Virtual research environment, Jupyter, Cloud

[1]  Craig A. Stewart,et al.  Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond , 2012 .

[2]  C. Tenopir,et al.  Data Sharing by Scientists: Practices and Perceptions , 2011, PloS one.

[3]  Zhiming Zhao,et al.  Knowledge-as-a-Service: A Community Knowledge Base for Research Infrastructures in Environmental and Earth Sciences , 2019, 2019 IEEE World Congress on Services (SERVICES).

[4]  C. Meijer,et al.  Laserchicken - A tool for distributed feature calculation from massive LiDAR point cloud datasets , 2020, SoftwareX.

[5]  Shreyas Cholia,et al.  Accelerating Experimental Science Using Jupyter and NERSC HPC , 2019, HUST/SE-HER/WIHPC@SC.

[6]  Fujio Watanabe,et al.  Authentication, Authorization, and Accounting , 2005 .

[7]  Keith G. Jeffery,et al.  ICT Infrastructures for Environmental and Earth Sciences , 2020, Towards Interoperable Research Infrastructures for Environmental and Earth Sciences.

[8]  Alexandra Kokkinaki,et al.  Supporting Cross-Domain System-Level Environmental and Earth Science , 2020, Towards Interoperable Research Infrastructures for Environmental and Earth Sciences.

[9]  Mark A. Miller,et al.  The CIPRES science gateway: enabling high-impact science for phylogenetics researchers with limited resources , 2012, XSEDE '12.

[10]  Sukhpal Singh Gill,et al.  Next generation technologies for smart healthcare: challenges, vision, model, trends and future directions , 2020, Internet Technol. Lett..

[11]  Jeffrey M. Perkel,et al.  Why Jupyter is data scientists’ computational notebook of choice , 2018, Nature.

[12]  Cees T. A. M. de Laat,et al.  Using Jade agent framework to prototype an e-Science workflow bus , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[13]  Zhiming Zhao,et al.  Contextual Linking between Workflow Provenance and System Performance Logs , 2019, 2019 15th International Conference on eScience (eScience).

[14]  F Morsdorf,et al.  Standardizing Ecosystem Morphological Traits from 3D Information Sources. , 2020, Trends in ecology & evolution.

[15]  Zhiming Zhao,et al.  Mapping heterogeneous research infrastructure metadata into a unified catalogue for use in a generic virtual research environment , 2019, Future Gener. Comput. Syst..

[16]  Yang Hu,et al.  Time‐critical data management in clouds: Challenges and a Dynamic Real‐Time Infrastructure Planner (DRIP) solution , 2019, Concurr. Comput. Pract. Exp..