Discriminant temporal patterns for linking physico-chemistry and biology in hydro-ecosystem assessment

We propose a new data mining process to extract original knowledge from hydro-ecological data, in order to help the identification of pollution sources. This approach is based (1) on a domain knowledge discretization (quality classes) of physico-chemical and biological parameters, and (2) on an extraction of temporal patterns used as discriminant features to link physico-chemistry with biology in river sampling sites. For each bio-index quality value, we obtained a set of significant discriminant features. We used them to identify the physico-chemical characteristics that impact on different biological dimensions according to their presence in extracted knowledge. The experiments meet with the domain knowledge and also highlight significant mismatches between physico-chemical and biological quality classes. Then, we discuss about the interest of using discriminant temporal patterns for the exploration and the analysis of temporal environmental data such as hydro-ecological databases.

[1]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[2]  Sovan Lek,et al.  Applications of artificial neural networks predicting macroinvertebrates in freshwaters , 2007, Aquatic Ecology.

[3]  Maguelonne Teisseire,et al.  OrderSpan: Mining Closed Partially Ordered Patterns , 2013, IDA.

[4]  Friedrich Recknagel,et al.  Ecological relationships, thresholds and time-lags determining phytoplankton community dynamics of Lake Kinneret, Israel elucidated by evolutionary computation and wavelets , 2013 .

[5]  Sašo Džeroski,et al.  Learning habitat models for the diatom community in Lake Prespa , 2010 .

[6]  Andy P. Dedecker,et al.  Decision Tree Models for Prediction of Macroinvertebrate Taxa in the River Axios (Northern Greece) , 2007, Aquatic Ecology.

[7]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[8]  Vincent S. Tseng,et al.  RuleGrowth: mining sequential rules common to several sequences by pattern-growth , 2011, SAC.

[9]  Philip S. Yu,et al.  Direct Discriminative Pattern Mining for Effective Classification , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[10]  Florence Le Ber,et al.  Identifying Ecological Traits: A Concrete FCA-Based Approach , 2009, ICFCA.

[11]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[12]  Ximing Cai,et al.  Identification of hydrologic indicators related to fish diversity and abundance: A data mining approach for fish community analysis , 2008 .

[13]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[14]  B. Whitton,et al.  Comparative performance of benthic diatom indices used to assess river water quality , 1995, Hydrobiologia.

[15]  Aloysius George,et al.  DRL-Prefixspan: A novel pattern growth algorithm for discovering downturn, revision and launch (DRL) sequential patterns , 2012, Central European Journal of Computer Science.

[16]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[17]  H. Unbehauen,et al.  Model order estimation and system identification theory and application to the modelling of 32P kenetics within the trophogenic zone of a small lake , 1979 .

[18]  Peter Goethals,et al.  Optimization of Artificial Neural Network (ANN) model design for prediction of macroinvertebrates in the Zwalm river basin (Flanders, Belgium) , 2004 .

[19]  J. Sinkeldam,et al.  A coded checklist and ecological indicator values of freshwater diatoms from The Netherlands , 1994, Netherland Journal of Aquatic Ecology.

[20]  Maguelonne Teisseire,et al.  Sequential patterns mining and gene sequence visualization to discover novelty from microarray data , 2011, J. Biomed. Informatics.

[21]  P. Goethals,et al.  Use of genetic algorithms to select input variables in decision tree models for the prediction of benthic macroinvertebrates , 2003 .

[22]  Other Directive 2000/60/EC of the European Parliament and of The Council of 23 October 2000 establishing a Framework for Community Action in the Field of Water Policy (Water Framework Directive) , 2000 .

[23]  Jiadong Ren,et al.  A Novel Sequential Pattern Mining Algorithm for the Feature Discovery of Software Fault , 2009, 2009 International Conference on Computational Intelligence and Software Engineering.

[24]  Miao Wang,et al.  Sequential Pattern Mining for Protein Function Prediction , 2008, ADMA.

[25]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[26]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.