CyberGIS‐Jupyter for reproducible and scalable geospatial analytics

The interdisciplinary field of cyberGIS (geographic information science and systems (GIS) based on advanced cyberinfrastructure) has a major focus on data‐ and computation‐intensive geospatial analytics. The rapidly growing needs across many application and science domains for such analytics based on disparate geospatial big data poses significant challenges to conventional GIS approaches. This paper describes CyberGIS‐Jupyter, an innovative cyberGIS framework for achieving data‐intensive, reproducible, and scalable geospatial analytics using Jupyter Notebook based on ROGER, the first cyberGIS supercomputer. The framework adapts the Notebook with built‐in cyberGIS capabilities to accelerate gateway application development and sharing while associated data, analytics, and workflow runtime environments are encapsulated into application packages that can be elastically reproduced through cloud‐computing approaches. As a desirable outcome, data‐intensive and scalable geospatial analytics can be efficiently developed and improved and seamlessly reproduced among multidisciplinary users in a novel cyberGIS science gateway environment.

[1]  Shaowen Wang,et al.  CyberGIS‐BioScope: a cyberinfrastructure‐based spatial decision‐making environment for biomass‐to‐biofuel supply chain optimization , 2015, GCE@SC.

[2]  Yan Liu,et al.  A CyberGIS Approach to Generating High-resolution Height Above Nearest Drainage (HAND) Raster for National Flood Mapping , 2016 .

[3]  Carl Boettiger,et al.  An introduction to Docker for reproducible research , 2014, OPSR.

[4]  Brian E. Granger,et al.  IPython: A System for Interactive Scientific Computing , 2007, Computing in Science & Engineering.

[5]  Pearl Brereton,et al.  Turning Software into a Service , 2003, Computer.

[6]  Matthew J. Turk How to Scale a Code in the Human Dimension , 2013, ArXiv.

[7]  Shaowen Wang,et al.  TeraGrid GIScience Gateway: Bridging cyberinfrastructure and GIScience , 2009, Int. J. Geogr. Inf. Sci..

[8]  Anton Nekrutenko,et al.  Ten Simple Rules for Reproducible Computational Research , 2013, PLoS Comput. Biol..

[9]  James Arthur Kohl,et al.  The Neutron Science TeraGrid Gateway: a TeraGrid science gateway to support the Spallation Neutron Source , 2007, Concurr. Comput. Pract. Exp..

[10]  Thomas Steinke,et al.  The MoSGrid Science Gateway - A Complete Solution for Molecular Simulations. , 2014, Journal of chemical theory and computation.

[11]  Shaowen Wang,et al.  A CyberGIS-Jupyter Framework for Geospatial Analytics at Scale , 2017, PEARC.

[12]  Nancy Wilkins-Diehr,et al.  TeraGrid Science Gateways and Their Impact on Science , 2008, Computer.

[13]  Shaowen Wang,et al.  A CyberGIS Integration and Computation Framework for High‐Resolution Continental‐Scale Flood Inundation Mapping , 2018, JAWRA Journal of the American Water Resources Association.

[14]  Eric A. Marks,et al.  Service-Oriented Architecture: A Planning and Implementation Guide for Business and Technology , 2006 .

[15]  Shaowen Wang,et al.  Depicting urban boundaries from a mobility network of spatial interactions: a case study of Great Britain with geo-located Twitter data , 2017, Int. J. Geogr. Inf. Sci..

[16]  et al.,et al.  Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.

[17]  Daniel C. Stanzione,et al.  Wrangler's user environment: A software framework for management of data-intensive computing system , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[18]  David R. Maidment,et al.  Conceptual Framework for the National Flood Interoperability Experiment , 2017 .

[19]  Nancy Wilkins-Diehr,et al.  Special Issue: Science Gateways—Common Community Interfaces to Grid Resources , 2007, Concurr. Comput. Pract. Exp..

[20]  Shaowen Wang,et al.  GISolve: a grid-based problem solving environment for computationally intensive geographic information analysis , 2005, CLADE 2005. Proceedings Challenges of Large Applications in Distributed Environments, 2005..

[21]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[22]  Robert Gentleman,et al.  Statistical Analyses and Reproducible Research , 2007 .

[23]  Calvin J. Ribbens,et al.  Hybrid Computing - Where HPC meets grid and Cloud Computing , 2011, Future Gener. Comput. Syst..

[24]  Zonglin Wang,et al.  Reliability Assessment for PSC Box-Girder Bridges Based on SHM Strain Measurements , 2017, J. Sensors.

[25]  Chaowei Yang,et al.  Utilizing Cloud Computing to address big geospatial data challenges , 2017, Comput. Environ. Urban Syst..

[26]  Michael McLennan,et al.  HUBzero: A Platform for Dissemination and Collaboration in Computational Science and Engineering , 2010, Computing in Science & Engineering.

[27]  Gerhard Klimeck,et al.  nanoHUB.org: Advancing Education and Research in Nanotechnology , 2008, Computing in Science & Engineering.

[28]  Shaowen Wang A CyberGIS Framework for the Synthesis of Cyberinfrastructure, GIS, and Spatial Analysis , 2010 .

[29]  Geoffrey C. Fox,et al.  Hybrid cloud and cluster computing paradigms for life science applications , 2010, BMC Bioinformatics.

[30]  Shaowen Wang,et al.  FluMapper: A cyberGIS application for interactive analysis of massive location‐based social media , 2014, Concurr. Comput. Pract. Exp..

[31]  Shaowen Wang CyberGIS and spatial data science , 2016 .

[32]  Shaowen Wang,et al.  TopoLens: Building a CyberGIS Community Data Service for Enhancing the Usability of High-resolution National Topographic Datasets , 2016, XSEDE.

[33]  Salvatore Monforte,et al.  The DECIDE Science Gateway , 2012, Journal of Grid Computing.

[34]  Joel H. Saltz,et al.  Demonstration of Hadoop-GIS: a spatial data warehousing system over MapReduce , 2013, SIGSPATIAL/GIS.

[35]  Yadu N. Babuji,et al.  Enabling Interactive Analytics of Secure Data using Cloud Kotta , 2017, ArXiv.

[36]  Shaowen Wang,et al.  CyberGIS Gateway for enabling data-rich geospatial research and education , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[37]  Marlon E. Pierce,et al.  Apache Airavata: Design and Directions of a Science Gateway Framework , 2014, 2014 6th International Workshop on Science Gateways.

[38]  Anne E. Trefethen,et al.  Cyberinfrastructure for e-Science , 2005, Science.

[39]  Yan Liu,et al.  SimpleGrid toolkit: Enabling geosciences gateways to cyberinfrastructure , 2009, Comput. Geosci..

[40]  Michael Milligan,et al.  Interactive HPC Gateways with Jupyter and Jupyterhub , 2017, PEARC.

[41]  Chaowei Yang,et al.  Enabling Big Geoscience Data Analytics with a Cloud-Based, MapReduce-Enabled and Service-Oriented Workflow Framework , 2015, PloS one.