Feedbacks on data collection, data modeling and data integration of large datasets: application to Rhin-Meuse and Rhone-Mediterranean districts (France)

To better understand hydrosystem functioning, we need to improve our knowledge of hydrobiological processes as well to identify and to quantify the associated pressures. In this context, the ANR11-MONU14 Fresqueau project associates data miners and hydrobiologists to define a new knowledge discovery process from datasets provided by public databases to fully meet the expert requirements. The required data are grouped into five major categories: (i) data on water quality, (ii) data characterizing sampling reaches, (iii) data describing the hydrographic network (iv) data estimating human activities (land use and waste water treatment plant) and (v) climate and environmental forcing variables. All these data are spatial and complex to structure and to inter-connect because of their volume and their nature. The studied data are characterized by a high heterogeneity due to their origin (values from measurements or expertise), their value that can be quantitative, semi-quantitative or qualitative, and their structure (point, line, surface polygon) as well as because of their temporal variability (sampling duration and frequency). The objective of this presentation is to introduce the first phase of our work for data gathering, data modeling and data integration. The inventory is carried out on two french districts: Rhine Meuse (33 000 km2) and Rhone Mediterranean and Corsica (130 000 km2). We present the main operational lessons of the work performed on the 16 concerned public databases (access, rights of use, data format, etc.). As a result we present the conceptual data model (data standardization and positioning linking).