Multidimensional modeling and analysis of large and complex watercourse data: an OLAP-based solution

Abstract This paper presents the application of Data Warehouse (DW) and On-Line Analytical Processing (OLAP) technologies to the field of water quality assessment. The European Water Framework Directive (DCE, 2000) underlined the necessity of having operational tools to help in the interpretation of the complex and abundant information regarding running waters and their functioning. Several studies have exemplified the interest in DWs for integrating large volumes of data and in OLAP tools for data exploration and analysis. Based on free software tools, we propose an extensible relational OLAP system for the analysis of physicochemical and hydrobiological watercourse data. This system includes: (i) two data cubes; (ii) an Extract, Transform and Load (ETL) tool for data integration; and (iii) tools for OLAP exploration. Many examples of OLAP analysis (thematic, temporal, spatiotemporal, and multiscale) are provided. We have extended an existing framework with complex aggregate functions that are used to define complex analysis indicators. Additional analysis dimensions are also introduced to allow their calculation and also for purposes of rendering information. Finally, we propose two strategies to address the problem of summarizing heterogeneous measurement units by: (i) transforming source data at the ETL tier, and (ii) introducing an additional analysis dimension at the OLAP server tier.

[1]  Jose-Norberto Mazón,et al.  WITHDRAWN: Designing OLAP schemata for data warehouses from conceptual models with MDA , 2010, DSS 2010.

[2]  François Pinet,et al.  Precise design of environmental data warehouses , 2010, Oper. Res..

[3]  Maguelonne Teisseire,et al.  Feedbacks on data collection, data modeling and data integration of large datasets: application to Rhin-Meuse and Rhone-Mediterranean districts (France) , 2013 .

[4]  Christopher Swan,et al.  A user-centered design for a spatial data warehouse for data exploration in environmental research , 2008, Ecol. Informatics.

[5]  Hei-Chia Wang,et al.  Constructing a water quality 2.0 OLAP system in Taiwan , 2013 .

[6]  Alberto Abelló,et al.  A Survey of Multidimensional Modeling Methodologies , 2009, Int. J. Data Warehous. Min..

[7]  W. H. Inmon,et al.  Building the data warehouse , 1992 .

[8]  Robert Wrembel,et al.  Data Warehouses And Olap: Concepts, Architectures And Solutions , 2006 .

[9]  Carsten Sapia On Modeling and Predicting Query Behavior in OLAP Systems , 1999, DMDW.

[10]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling , 1996 .

[11]  M. P. McGuire,et al.  MODELING, VISUALIZING, AND MINING HYDROLOGIC SPATIAL HIERARCHIES FOR WATER QUALITY MANAGEMENT , 2006 .

[12]  Constanta Zoie Radulescu,et al.  A multidimensional data model for environment protection , 2008 .

[13]  Sandro Bimonte,et al.  When Spatial Analysis Meets OLAP: Multidimensional Model and Operators , 2010, Int. J. Data Warehous. Min..

[14]  Il-Yeol Song,et al.  A UML profile for multidimensional modeling in data warehouses , 2006, Data Knowl. Eng..

[15]  Juan Trujillo,et al.  An MDA Approach for the Development of Spatial Data Warehouses , 2008, DaWaK.

[16]  Shashi Shekhar,et al.  CubeView: a system for traffic data visualization , 2002, Proceedings. The IEEE 5th International Conference on Intelligent Transportation Systems.

[17]  Jiawei Han,et al.  Object-Based Selective Materialization for Efficient Implementation of Spatial Data Cubes , 2000, IEEE Trans. Knowl. Data Eng..

[18]  Sandro Bimonte,et al.  Guaranteeing the quality of multidimensional analysis in data warehouses of simulation results: Application to pesticide transfer data produced by the MACRO model , 2013, Ecol. Informatics.

[19]  George Karabatis,et al.  Semantic integration of government data for water quality management , 2007, Gov. Inf. Q..

[20]  Carsten Sapia,et al.  Automatically generating OLAP schemata from conceptual graphical models , 2000, DOLAP '00.

[21]  Gabriel Gorghiu,et al.  Using OLAP Systems to Manage Environmental Risks in Dambovita County , 2011 .

[22]  José Samos,et al.  YAM2: a multidimensional conceptual model extending UML , 2006, Inf. Syst..

[23]  François Pinet,et al.  EIS Pesticides: An environmental information system to characterize agricultural activities and calculate agro-environmental indicators at embedded watershed scales , 2013 .

[24]  Sandro Bimonte,et al.  A UML & Spatial OCL based Approach for Handling Quality Issues in SOLAP Systems , 2012, ICEIS.

[25]  Esteban Zimányi,et al.  Advanced Data Warehouse Design: From Conventional to Spatial and Temporal Applications , 2010 .

[26]  Yvan Bédard,et al.  Integrating GIS components with knowledge discovery technology for environmental health decision support , 2003, Int. J. Medical Informatics.

[27]  Sandro Bimonte,et al.  Conceptual model for spatial data cubes: A UML profile and its automatic implementation , 2015, Comput. Stand. Interfaces.

[28]  Bambang Parmanto,et al.  Development of SOVAT: A numerical-spatial decision support system for community health assessment research , 2006, Int. J. Medical Informatics.

[29]  Guillaume Deffuant,et al.  Semi-Automatic Design of Spatial Data Cubes from Simulation Model Results , 2013, Int. J. Data Warehous. Min..

[30]  A. N. Strahler Quantitative analysis of watershed geomorphology , 1957 .