Ensemble of naïve Bayesian approaches for the study of biofilm development in drinking water distribution systems

Various studies have been performed in relation to the influence that a number of characteristics of drinking water distribution systems (DWDSs) have on biofilm development. Nevertheless, their joint influence, apart from a few exceptions, has scarcely been studied due to the complexity of the community and the environment. In this paper, we apply various machine learning algorithms based on naïve Bayesian networks. Alternatives for the base naïve Bayesian model to outperform individual performances while maintaining simplicity are suggested. These alternatives include augmentation of the arcs in the graph, and initial bagging approaches. Finally, a combination of different naïve approaches in a bagging process that produces explanatory hybrid decision trees is proposed. As a result, it is possible to achieve a deeper understanding of the consequences that the interaction of the relevant hydraulic and physical factors of DWDSs has on biofilm development.

[1]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics (e1071), TU Wien , 2014 .

[2]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[3]  Shakhawat Chowdhury,et al.  Heterotrophic bacteria in drinking water distribution system: a review , 2012, Environmental Monitoring and Assessment.

[4]  H. Flemming,et al.  Biofilms in drinking water and their role as reservoir for pathogens. , 2011, International journal of hygiene and environmental health.

[5]  Geoffrey I. Webb,et al.  Encyclopedia of Machine Learning , 2011, Encyclopedia of Machine Learning.

[6]  Sotiris B. Kotsiantis,et al.  Combining bagging, boosting, rotation forest and random subspace methods , 2011, Artificial Intelligence Review.

[7]  Taeho Lee,et al.  Microbial diversity in biofilms on water distribution pipes of different materials. , 2010, Water science and technology : a journal of the International Association on Water Pollution Research.

[8]  Kurt Hornik,et al.  Open-source machine learning: R meets Weka , 2009, Comput. Stat..

[9]  Guibai Li,et al.  Effect of pipe material and low level disinfectants on biofilm development in a simulated drinking water distribution system , 2009 .

[10]  Ryan T. Christensen Age effects on iron -based pipes in water distribution systems , 2009 .

[11]  M. Vieira,et al.  The role of hydrodynamic stress on the phenotypic characteristics of single and binary biofilms of Pseudomonas fluorescens. , 2007, Water science and technology : a journal of the International Association on Water Pollution Research.

[12]  M. O. Pereira,et al.  Control of flow-generated biofilms with surfactants : evidence of resistance and recovery , 2006 .

[13]  H. Albrechtsen,et al.  Effect of temperature and pipe material on biofilm formation and survival of Escherichia coil in used drinking water pipes: a laboratory-based study. , 2006, Water science and technology : a journal of the International Association on Water Pollution Research.

[14]  Ilkka T Miettinen,et al.  The effects of changing water flow velocity on the formation of biofilms and water quality in pilot distribution system consisting of copper or polyethylene pipes. , 2006, Water research.

[15]  Z. Tsvetanova STUDY OF BIOFILM FORMATION ON DIFFERENT PIPE MATERIALS IN A MODEL OF DRINKING WATER DISTRIBUTION SYSTEM AND ITS IMPACT ON MICROBIOLOGICAL WATER QUALITY , 2006 .

[16]  H. Videla,et al.  Microbiologically influenced corrosion: looking to the future. , 2005, International microbiology : the official journal of the Spanish Society for Microbiology.

[17]  Y. Tsai Impact of flow velocity on the dynamic behaviour of biofilm bacteria , 2005, Biofouling.

[18]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[19]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[20]  T E Cloete,et al.  Dynamic response of biofilm to pipe surface and fluid velocity. , 2003, Water science and technology : a journal of the International Association on Water Pollution Research.

[21]  Eamonn J. Keogh,et al.  Learning the Structure of Augmented Bayesian Classifiers , 2002, Int. J. Artif. Intell. Tools.

[22]  P. Le Cloirec,et al.  Experimental study and modelling of zinc and lead migration in sandy soils due to stormwater infiltration. , 2002, Water science and technology : a journal of the International Association on Water Pollution Research.

[23]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[24]  P. Servais,et al.  Impacts of pipe materials on densities of fixed bacterial biomass in a drinking water distribution system , 2000 .

[25]  T. E. Cloete,et al.  An overview of biofilm formation in distribution systems and its impact on the deterioration of water quality , 2000 .

[26]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[27]  P. Stewart,et al.  Direct measurement of chlorine penetration into biofilms during disinfection , 1994, Applied and environmental microbiology.

[28]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.