New data mining and calibration approaches to the assessment of water treatment efficiency

For the first time, the application of different robust data mining techniques to the assessment of water treatment performance is considered. Principal components analysis (PCA), parallel factor analysis (PARAFAC), and a self-organizing map (SOM) were used in the analysis of multivariate data characterising organic matter (OM) removal at 16 water treatment works. Decomposed fluorescence data from PCA, PARAFAC and SOM were used as input to calibrate fluorescence data with OM concentrations using stepwise regression (SR), partial least squares (PLS), multiple linear regression (MLR), and neural network with back-propagation algorithm (BPNN). The best results were obtained with combined PARAFAC/PLS and SOM/BPNN. Both the numerical accuracy and feasibility of the adopted solutions were compared and recommendations on the use of the above techniques for fluorescence data analysis are presented.

[1]  Ulku Yetis,et al.  Effects of Bromide Ion and Natural Organic Matter Fractions on the Formation and Speciation of Chlorination By-Products , 2007 .

[2]  J. Rook Formation of Haloforms during Chlorination of natural Waters , 1974 .

[3]  F Despagne,et al.  Neural networks in multivariate calibration. , 1998, The Analyst.

[4]  Q. Hu,et al.  Characteristics and reactivity of algae-produced dissolved organic carbon , 2005 .

[5]  Albert Bos,et al.  Tutorial review—Data processing by neural networks in quantitative chemical analysis , 1993 .

[6]  Andy Baker,et al.  The freshwater dissolved organic matter fluorescence–total organic carbon relationship , 2007 .

[7]  Yong-Sik Yim,et al.  Application of artificial neural networks to the analysis of two-dimensional fluorescence spectra in recombinant E coli fermentation processes , 2005 .

[8]  R. Bro PARAFAC. Tutorial and applications , 1997 .

[9]  Wontae Lee,et al.  Correlations between organic matter properties and DBP formation during chloramination. , 2008, Water research.

[10]  Mark M. Benjamin,et al.  Use of UV Spectroscopy To Characterize the Reaction between NOM and Free Chlorine , 2000 .

[11]  G. Hall,et al.  Estuarial fingerprinting through multidimensional fluorescence and multivariate analysis. , 2005, Environmental science & technology.

[12]  João G Crespo,et al.  An improved method for two-dimensional fluorescence monitoring of complex bioreactors. , 2007, Journal of biotechnology.

[13]  D. Hammerstrom,et al.  Working with neural networks , 1993, IEEE Spectrum.

[14]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[15]  Zaid K. Chowdhury,et al.  Developing Models for Predicting Trihalomethane Formation Potential and Kinetics , 1987 .

[16]  I A Basheer,et al.  Artificial neural networks: fundamentals, computing, design, and application. , 2000, Journal of microbiological methods.

[17]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[18]  A. Mishra,et al.  Multivariate methods on the excitation emission matrix fluorescence spectroscopic data of diesel-kerosene mixtures: a comparative study. , 2007, Analytica chimica acta.

[19]  Rasmus Bro,et al.  The N-way Toolbox for MATLAB , 2000 .

[20]  A. Baker,et al.  Fluorescence analysis of dissolved organic matter in natural, waste and polluted waters—a review , 2007 .

[21]  R. Henrion,et al.  Three-way Principal Components Analysis for fluorescence spectroscopic classification of algae species , 1997 .

[22]  S. Thacker,et al.  Relating dissolved organic matter fluorescence and functional properties. , 2008, Chemosphere.

[23]  R. Bro,et al.  Tracing dissolved organic matter in aquatic environments using a new approach to fluorescence spectroscopy , 2003 .

[24]  J S Almeida,et al.  Two-dimensional fluorometry coupled with artificial neural networks: a novel method for on-line monitoring of complex biological processes. , 2001, Biotechnology and bioengineering.

[25]  C. Brunsdon,et al.  Can fluorescence spectrometry be used as a surrogate for the Biochemical Oxygen Demand (BOD) test in water quality assessment? An example from South West England. , 2008, The Science of the total environment.

[26]  J C Lipscomb,et al.  Potential health effects of drinking water disinfection by-products using quantitative structure toxicity relationship. , 2000, Toxicology.

[27]  J. Zupan,et al.  Neural networks: A new method for solving chemical problems or just a passing phase? , 1991 .

[28]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[29]  M. Wedborg,et al.  Multivariate evaluation of the fluorescence of aquatic organic matter , 2001 .

[30]  Stefan Geyer,et al.  Spectroscopic properties of dissolved humic substances - a reflection of land use history in a fen area , 1999 .

[31]  Zulfiqur Ali,et al.  Total luminescence spectroscopy with pattern recognition for classification of edible oils. , 2003, The Analyst.

[32]  Ricard Boqué,et al.  Rapid detection of olive–pomace oil adulteration in extra virgin olive oils from the protected denomination of origin “Siurana” using excitation–emission fluorescence spectroscopy and three-way methods of analysis , 2005 .

[33]  R. Bro,et al.  Handling of Rayleigh and Raman scatter for PARAFAC modeling of fluorescence data using interpolation , 2006 .

[34]  Bruce Jefferson,et al.  Natural organic matter – the relationship between character and treatability , 2004 .

[35]  Philip C. Singer,et al.  Chloroform Formation in Public Water Supplies: A Case Study , 1979 .

[36]  M. Carlson,et al.  Controlling DBPs with monocholoramine , 1998 .

[37]  M. Bieroza,et al.  Relating freshwater organic matter fluorescence to organic carbon removal efficiency in drinking water treatment. , 2009, The Science of the total environment.

[38]  R. Ganeshram,et al.  Discriminatory classification of natural and anthropogenic waters in two U.K. estuaries. , 2007, The Science of the total environment.

[39]  R. Melnick,et al.  Assessment of the carcinogenic potential of chlorinated water: experimental studies of chlorine, chloramine, and trihalomethanes. , 1993, Journal of the National Cancer Institute.

[40]  Rasmus Bro,et al.  Parallel factor analysis of excitation-emission matrix fluorescence spectra of water soluble soil organic matter as basis for the determination of conditional metal binding parameters. , 2008, Environmental science & technology.

[41]  R. Conmy,et al.  Examining CDOM fluorescence variability using principal component analysis: seasonal and regional modeling of three-dimensional fluorescence in the Gulf of Mexico , 2004 .

[42]  Bruce Thompson,et al.  Stepwise Regression and Stepwise Discriminant Analysis Need Not Apply here: A Guidelines Editorial , 1995 .

[43]  A. Bos,et al.  Tutorial review—Data processing by neural networks in quantitative chemical analysis , 1993 .