A centralized tool for managing, archiving, and serving point-in-time data in ecological research laboratories

The recent proliferation of software tools that aid researchers in various phases of data tracking and analysis undoubtedly contribute to successful development of increasingly complex and data-intensive scientific investigations. However, the lack of fully integrated solutions to data acquisition and storage, quality assurance/control, visualization, and provenance tracking of heterogeneous temporal data streams collected at numerous geospatial locations continues to occupy a general problem area for scientists and data managers working in the environmental sciences. We present a new Service Oriented Architecture (SOA) that allows users to: 1) automate the process of pushing real-time data streams from networks of environmental sensors or other data sources to an electronic data archive; 2) to perform basic data management and quality control tasks; and 3) to publish any subset of the data to existing cyberinfrastructure platforms for global discovery and distribution via the World Wide Web. The approach outlined here supports management of: 1) repeated field observations, 2) data from laboratory analysis of field samples, 3) simulation results, and 4) derived values. We describe how the use of Hypertext Transfer Protocol (HTTP) Application Programming Interfaces (APIs) Representational State Transfer (REST) methods for data model objects and Resource Query Language (RQL) interfaces respond to a basic problem area in environmental modelling by enabling researchers to integrate an electronic data repository with existing workflows, simulation models, or third-party software. Integrating data life-cycle management and e-Science publication systems is needed.The VOEIS Data Hub (VDH), a new software application, responds to this need.VDH provides software tools for data storage, management, and visualization.VDH integrates with models, third-party software, and other cyberinfrastructures.

[1]  Ryan Hafen,et al.  A Visual Analytics Approach to Understanding Spatiotemporal Hotspots , 2010, IEEE Transactions on Visualization and Computer Graphics.

[2]  Silvana Castano,et al.  Semantic integration of heterogeneous information sources , 2001, Data Knowl. Eng..

[3]  Roy Fielding,et al.  Architectural Styles and the Design of Network-based Software Architectures"; Doctoral dissertation , 2000 .

[4]  Ari Jolma,et al.  Interfacing environmental simulation models and databases using XML , 2003, Environ. Model. Softw..

[5]  Robert M. Edsall The parallel coordinate plot in action: design and use for geographic visualization , 2003, Comput. Stat. Data Anal..

[6]  Hassan A. Karimi,et al.  Coupling methodologies for environmental models , 2000, Environ. Model. Softw..

[7]  Mathias Weske,et al.  Scientific Workflows: Business as Usual? , 2009, BPM.

[8]  Richard Hull,et al.  Managing semantic heterogeneity in databases: a theoretical prospective , 1997, PODS.

[9]  William K. Michener,et al.  Meta-information concepts for ecological data management , 2006, Ecol. Informatics.

[10]  Greg Wilson,et al.  Data management to enhance long‐term watershed research capacity: context and STEWARDS case study , 2009 .

[11]  Laura Díaz,et al.  Service-oriented applications for environmental models: Reusable geospatial services , 2010, Environ. Model. Softw..

[12]  Sophia Karagiorgou,et al.  A service oriented architecture for decision support systems in environmental crisis management , 2012, Future Gener. Comput. Syst..

[13]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[14]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[15]  Cláudio T. Silva,et al.  Managing the Evolution of Dataflows with VisTrails , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[16]  James Cheney,et al.  Provenance management in curated databases , 2006, SIGMOD Conference.

[17]  Paul C Hanson,et al.  Staying afloat in the sensor data deluge. , 2012, Trends in ecology & evolution.

[18]  David S. Ebert,et al.  A Visual Analytics Approach to Understanding , 2010 .

[19]  Jeffery S. Horsburgh,et al.  An integrated system for publishing environmental observations data , 2009, Environ. Model. Softw..

[20]  Ladislav Hluchý,et al.  Service Oriented Architecture for Risk Assessment of Natural Disasters , 2005, PPAM.

[21]  Helena Karasti,et al.  Enriching the Notion of Data Curation in E-Science: Data Managing and Information Infrastructuring in the Long Term Ecological Research (LTER) Network , 2006, Computer Supported Cooperative Work (CSCW).

[22]  Shawn Bowers,et al.  An ontology for describing and synthesizing ecological observation data , 2007, Ecol. Informatics.

[23]  Geoffrey C. Fox,et al.  Examining the Challenges of Scientific Workflows , 2007, Computer.

[24]  Clemente Izurieta,et al.  A Cyber-Infrastructure for a Virtual Observatory and Ecological Informatics System -VOEIS , 2010 .

[25]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[26]  Jaroslav Pokorný,et al.  Database architectures: Current trends and their relationships to environmental data management , 2006, Environ. Model. Softw..

[27]  Jeffery S. Horsburgh,et al.  A relational model for environmental and water resources data , 2008 .

[28]  Jeffrey Heer,et al.  D³ Data-Driven Documents , 2011, IEEE Transactions on Visualization and Computer Graphics.

[29]  Jeffrey Heer,et al.  SpanningAspectRatioBank Easing FunctionS ArrayIn ColorIn Date Interpolator MatrixInterpola NumObjecPointI Rectang ISchedu Parallel Pause Scheduler Sequen Transition Transitioner Transiti Tween Co DelimGraphMLCon IData JSONCon DataField DataSc Dat DataSource Data DataUtil DirtySprite LineS RectSprite , 2011 .