Multivariate data mining for estimating the rate of discolouration material accumulation in drinking water distribution systems

Particulate material accumulates over time as cohesive layers on internal pipeline surfaces in water distribution systems (WDS). When mobilised, this material can cause discolouration. This paper explores factors expected to be involved in this accumulation process. Two complementary machine learning methodologies are applied to significant amounts of real world field data from both a qualitative and a quantitative perspective. First, Kohonen self-organising maps were used for integrative and interpretative multivariate data mining of potential factors affecting accumulation. Second, evolutionary polynomial regression (EPR), a hybrid data-driven technique, was applied that combines genetic algorithms with numerical regression for developing easily interpretable mathematical model expressions. EPR was used to explore producing novel simple expressions to highlight important accumulation factors. Three case studies are presented: UK national and two Dutch local studies. The results highlight bulk water iron concentration, pipe material and looped network areas as key descriptive parameters for the UK study. At the local level, a significantly increased third data set allowed K-fold cross validation. The mean cross validation coefficient of determination was 0.945 for training data and 0.930 for testing data for an equation utilising amount of material mobilised and soil temperature for estimating daily regeneration rate. The approach shows promise for developing transferable expressions usable for pro-active WDS management.

[1]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[2]  Daniele B. Laucelli,et al.  Asset deterioration analysis using multi-utility data and multi-objective data mining , 2009 .

[3]  Mirjam Blokker,et al.  Zooming in on Network Fouling Locations , 2013 .

[4]  J B Boxall,et al.  Asset deterioration and discolouration in water distribution systems. , 2011, Water research.

[5]  Daniele B. Laucelli,et al.  Study on relationships between climate-related covariates and pipe bursts using evolutionary-based modelling , 2014 .

[6]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[7]  Joby Boxall,et al.  Discoloration Material Accumulation in Water Distribution Systems , 2011 .

[8]  Scott D. Struck,et al.  World Environmental and Water Resources Congress 2013 : Showcasing the Future , 2013 .

[9]  Joby Boxall,et al.  Aggressive flushing for discolouration event mitigation in water distribution networks , 2003 .

[10]  Pragya Agarwal,et al.  Self-Organising Maps , 2008 .

[11]  Avi Ostfeld,et al.  A coupled model tree (MT) genetic algorithm (GA) scheme for biofouling assessment in pipelines. , 2011, Water research.

[12]  E. Blokker Stochastic water demand modelling for a better understanding of hydraulics in water distribution networks , 2010 .

[13]  Rebecca Sharpe Laboratory investigations into processes causing discoloured potable water , 2012 .

[14]  A Seth,et al.  Characterisation of materials causing discolouration in potable water systems. , 2004, Water science and technology : a journal of the International Association on Water Pollution Research.

[15]  Joby Boxall,et al.  Field studies of discoloration in water distribution systems: model verification and practical implications. , 2010 .

[16]  D. Savić,et al.  Advances in data-driven analyses and modelling using EPR-MOGA. , 2009 .

[17]  V. D. Gupta,et al.  study on , 2012 .

[18]  Dimitri P. Solomatine,et al.  Machine learning in sedimentation modelling , 2006, Neural Networks.

[19]  Joby Boxall,et al.  Modelling both the continual erosion and regeneration of discolouration material in drinking water distribution systems , 2014 .

[20]  Kent McClymont,et al.  A general multi-objective hyper-heuristic for water distribution network design with discolouration risk , 2013 .

[21]  J Boxall,et al.  Bacterial community dynamics during the early stages of biofilm formation in a chlorinated experimental drinking water distribution system: implications for drinking water discolouration , 2014, Journal of applied microbiology.

[22]  D. Savić,et al.  A symbolic data-driven technique based on evolutionary polynomial regression , 2006 .

[23]  A. M. Kalteh,et al.  Review of the self-organizing map (SOM) approach in water resources: Analysis, modelling and application , 2008, Environ. Model. Softw..

[24]  Holger R. Maier,et al.  Protocol for developing ANN models and its application to the assessment of the quality of the ANN model development process in drinking water quality modelling , 2014, Environ. Model. Softw..

[25]  J B Boxall,et al.  The bacteriological composition of biomass recovered by flushing an operational drinking water distribution system. , 2014, Water research.

[26]  S. Mounce,et al.  A BIO- HYDROINFORMATICS APPLICATION OF SELF- ORGANIZING MAP NEURAL NETWORKS FOR ASSESSING MICROBIAL AND PHYSICO-CHEMICAL WATER QUALITY IN DISTRIBUTION SYSTEMS , 2012 .

[27]  Joby Boxall,et al.  Regeneration of Discolouration in Distribution Systems , 2003 .

[28]  Luigi Berardi,et al.  Prioritizing pipe replacement: from multiobjective genetic algorithms to operational decision support. , 2009 .