A template framework for environmental timeseries data acquisition

Abstract Environmental timeseries data variety is exploding in the Internet of Things era, making data reuse a very demanding task. Data acquisition and integration remains a laborious step of the environmental data lifecycle. Environmental data heterogeneity is a persistent issue, as data are becoming available through different protocols and stored under diverse, custom formats. In this work, we deal with syntactic heterogeneity in environmental timeseries data. Our approach is based on describing different dataset syntaxes using abstract representations, called templates. We designed and implemented EDAM (Environmental Data Acquisition Module), a template framework that facilitates timeseries data acquisition and integration. EDAM templates are written using programming language-agnostic semantics, and can be reused both for input and output, thus enabling data reuse via transformations across different formats. We demonstrate EDAM generality in seven case studies, which involve scraping online data, extracting observations from a relational database, or aggregating historical timeseries stored in local files. Case studies span different environmental sciences domains, including meteorology, agriculture, urban air quality and hydrology. We also demonstrate EDAM for data dissemination, as instructed by output templates. We identified several syntactic interoperability challenges though the case studies, that include managing with differences in formatting observables, temporal and spatial references, and metadata documentation, and addressed them with EDAM. EDAM implementation has been released under an open-source license.

[1]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[2]  Christiane Schmullius,et al.  Multi-Source Data Processing Middleware for Land Monitoring within a Web-Based Spatial Data Infrastructure for Siberia , 2013, ISPRS Int. J. Geo Inf..

[3]  James D. Myers,et al.  Identification and characterization of information-networks in long-tail data collections , 2017, Environ. Model. Softw..

[4]  Jeffrey Heer,et al.  Wrangler: interactive visual specification of data transformation scripts , 2011, CHI.

[5]  Scott D. Peckham,et al.  Driving plug-and-play models with data from web services: A demonstration of interoperability between CSDMS and CUAHSI-HIS , 2013, Comput. Geosci..

[6]  Jó Ueyama,et al.  Development of a spatial decision support system for flood risk management in Brazil that combines volunteered geographic information with wireless sensor networks , 2015, Comput. Geosci..

[7]  John Davidson,et al.  Ogc® sensor web enablement:overview and high level achhitecture. , 2007, 2007 IEEE Autotestcon.

[8]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[9]  王海龙,et al.  Raspberry Pi Model B , 2012 .

[10]  Jeffery S. Horsburgh,et al.  HydroDesktop: Web services-based software for hydrologic data discovery, download, visualization, and analysis , 2012, Environ. Model. Softw..

[11]  Valerie O. Snow,et al.  Agricultural production systems modelling and software: Current status and future prospects , 2015, Environ. Model. Softw..

[12]  Anne E. Trefethen,et al.  e-Science and its implications , 2003, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[13]  Mary Roth,et al.  Data Wrangling: The Challenging Yourney from the Wild to the Lake , 2015, CIDR.

[14]  Jeffery S. Horsburgh,et al.  An integrated system for publishing environmental observations data , 2009, Environ. Model. Softw..

[15]  Steven J. Burian,et al.  A new open source platform for lowering the barrier for environmental web app development , 2016, Environ. Model. Softw..

[16]  Jeffery S. Horsburgh,et al.  Components of an environmental observatory information system , 2011, Comput. Geosci..

[17]  Wouter Joosen,et al.  Dynamic reconfiguration using template based web service composition , 2008, MW4SOC '08.

[18]  Ioannis N. Athanasiadis,et al.  webXTREME: R-based web tool for calculating agroclimatic indices of extreme events , 2017, Comput. Electron. Agric..

[19]  Kirk Martinez,et al.  Toward an environmental Internet of Things , 2015 .

[20]  Jeffery S. Horsburgh,et al.  A relational model for environmental and water resources data , 2008 .

[21]  Jeffery S. Horsburgh,et al.  A data management and publication workflow for a large-scale, heterogeneous sensor network , 2015, Environmental Monitoring and Assessment.

[22]  Senthold Asseng,et al.  An overview of APSIM, a model designed for farming systems simulation , 2003 .

[23]  Pericles A. Mitkas,et al.  An agent-based intelligent environmental monitoring system , 2004, ArXiv.

[24]  James W. Jones,et al.  Harmonization and translation of crop modeling data to ensure interoperability , 2014, Environ. Model. Softw..

[25]  Valentin Cristea,et al.  A Unified Approach to Data Modeling and Management in Big Data Era , 2016 .

[26]  Joshua D. Woodard,et al.  Big data and Ag-Analytics , 2016 .

[27]  Steffen Stadtmüller,et al.  On-the-fly Integration of Static and Dynamic Sources , 2013, COLD.

[28]  Ioannis N. Athanasiadis Challenges in Modelling of Environmental Semantics , 2015, ISESS.

[29]  Clemente Izurieta,et al.  A centralized tool for managing, archiving, and serving point-in-time data in ecological research laboratories , 2014, Environ. Model. Softw..

[30]  Fred L. Drake,et al.  The Python Language Reference Manual , 1999 .

[31]  Laura Díaz,et al.  Service-oriented applications for environmental models: Reusable geospatial services , 2010, Environ. Model. Softw..

[32]  Wes McKinney,et al.  pandas: a Foundational Python Library for Data Analysis and Statistics , 2011 .

[33]  Steffen Stadtmüller,et al.  Data-Fu: a language and an interpreter for interaction with read/write linked data , 2013, WWW.

[34]  Graeme McFerren,et al.  Evaluating Sensor Observation Service implementations , 2009, 2009 IEEE International Geoscience and Remote Sensing Symposium.

[35]  Ioannis N. Athanasiadis,et al.  A Miniature Data Repository on a Raspberry Pi , 2016 .

[36]  Evangelia Papoutsoglou,et al.  Towards an air pollution health study data management system - A case study from a smoky Swiss railway , 2015, EnviroInfo/ICT4S.

[37]  Chris Murphy,et al.  APSIM - Evolution towards a new generation of agricultural systems simulation , 2014, Environ. Model. Softw..