Determination of the optimal parameters in regression models for the prediction of chlorophyll-a: a case study of the Yeongsan Reservoir, Korea.

Statistical regression models involve linear equations, which often lead to significant prediction errors due to poor statistical stability and accuracy. This concern arises from multicollinearity in the models, which may drastically affect model performance in terms of a trade-off scenario for effective water resource management logistics. In this paper, we propose a new methodology for improving the statistical stability and accuracy of regression models, and then show how to cope with pitfalls in the models and determine optimal parameters with a decreased number of predictive variables. Here, a comparison of the predictive performance was made using four types of multiple linear regression (MLR) and principal component regression (PCR) models in the prediction of chlorophyll-a (chl-a) concentration in the Yeongsan (YS) Reservoir, Korea, an estuarine reservoir that historically suffers from high levels of nutrient input. During a 3-year water quality monitoring period, results showed that PCRs could be a compact solution for improving the accuracy of the models, as in each case MLR could not accurately produce reliable predictions due to a persistent collinearity problem. Furthermore, based on R(2) (goodness of fit) and F-overall number (confidence of regression), and the number of explanatory variables (R-F-N) curve, it was revealed that PCR-F(7) was the best model among the four regression models in predicting chl-a, having the fewest explanatory variables (seven) and the lowest uncertainty. Seven PCs were identified as significant variables, related to eight water quality parameters: pH, 5-day biochemical oxygen demand, total coliform, fecal indicator bacteria, chemical oxygen demand, ammonia-nitrogen, total nitrogen, and dissolved oxygen. Overall, the results not only demonstrated that the models employed successfully simulated chl-a in a reservoir in both the test and validation periods, but also suggested that the optimal parameters should cautiously be considered in the design of regression models.

[1]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[2]  Jan-Tai Kuo,et al.  Lake eutrophication management modeling using dynamic programming. , 2008, Journal of environmental management.

[3]  W. Taylor,et al.  Bacteria—chlorophyll relationships in Ethiopian lakes of varying salinity: are soda lakes different? , 1997 .

[4]  D. Kleinbaum,et al.  Applied Regression Analysis and Other Multivariate Methods , 1978 .

[5]  S W Kim,et al.  Spatial and temporal pollutant budget analyses toward the total maximum daily loads management for the Yeongsan watershed in Korea. , 2007, Water science and technology : a journal of the International Association on Water Pollution Research.

[6]  R. Wu,et al.  Eutrophication, Water Borne Pathogens and Xenobiotic Compounds: Environmental Risks and Challenges , 1999 .

[7]  J. Burkholder,et al.  WATER QUALITY TRENDS AND MANAGEMENT IMPLICATIONS FROM A FIVE-YEAR STUDY OF A EUTROPHIC ESTUARY , 2000 .

[8]  Barry N. Taylor,et al.  Guidelines for Evaluating and Expressing the Uncertainty of Nist Measurement Results , 2017 .

[9]  J. Bailey–Brock,et al.  An Unique Anchialine Pool in the Hawaiian Islands , 1998 .

[10]  B. Whitton,et al.  The water quality of the River Wear, north-east England , 2000, The Science of the total environment.

[11]  R. Wetzel Limnology: Lake and River Ecosystems , 1975 .

[12]  Lars Håkanson,et al.  On the issue of limiting nutrient and predictions of cyanobacteria in aquatic systems. , 2007, The Science of the total environment.

[13]  Ralph Mac Nally,et al.  Multiple regression and inference in ecology and conservation biology: further comments on identifying important predictor variables , 2002, Biodiversity & Conservation.

[14]  I. Leonardos,et al.  Long term changes in the eutrophication process in a shallow Mediterranean lake ecosystem of W. Greece: response after the reduction of external load. , 2008, Journal of environmental management.

[15]  Max Henrion,et al.  Uncertainty: A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis , 1990 .

[16]  Lars Håkanson,et al.  Coefficients of variation for chlorophyll, green algae, diatoms, cryptophytes and blue-greens in rivers as a basis for predictive modelling and aquatic management. , 2003 .

[17]  S. Lane,et al.  Biological and chemical factors influencing shallow lake eutrophication: a long-term study. , 2002, The Science of the total environment.

[18]  Tormod Næs,et al.  A unified description of classical classification methods for multicollinear data , 1998 .

[19]  Mik Wisniewski,et al.  Applied Regression Analysis: A Research Tool , 1990 .

[20]  Xiaoyun Zhang,et al.  Prediction of quantitative calibration factors of some organic compounds in gas chromatography. , 2008, The Analyst.

[21]  S. Greenhalgh,et al.  Eutrophication and Hypoxia in Coastal Areas , 2008 .

[22]  C. Mason,et al.  Causes of low oxygen in a lowland, regulated eutrophic river in Eastern England. , 2003, The Science of the total environment.

[23]  Hoon Kim A First Course in Statistical Methods , 2005, Technometrics.

[24]  S. S. S. Laua,et al.  Biological and chemical factors influencing shallow lake eutrophication : a long-term study , 2002 .

[25]  H. Gruner Vollenweider, R. A. (Editor): A Manual on Methods für Measuring Primary Production in Aquatic Environments.–IBP Handbook No. 12. — Oxford und Edinburgh (Blackwell Scientific Publications) 1969; 244 S., 23 Abb.; 45 sh , 1972 .

[26]  Richard A. Vollenweider,et al.  Input-output models , 1975, Schweizerische Zeitschrift für Hydrologie.

[27]  R. John Linear Statistical Models: An Applied Approach , 1986 .

[28]  D. Sengupta Linear models , 2003 .

[29]  M. Innamorati,et al.  Interrelationships between phytoplankton biomass and nutrients in the eutrophicated areas of the North-Western Adriatic Sea , 1992 .

[30]  S. Pitois,et al.  Sources of the eutrophication problems associated with toxic algae: an overview. , 2001, Journal of environmental health.

[31]  F. Giovanardi,et al.  Statistical assessment of trophic conditions. Application of the OECD methodology to the marine environment , 1992 .

[32]  R. H. Myers Classical and modern regression with applications , 1986 .

[33]  R. L. Ott,et al.  A First Course in Statistical Methods , 2004 .

[34]  Rajarshi Guha,et al.  Development of QSAR Models To Predict and Interpret the Biological Activity of Artemisinin Analogues , 2004, J. Chem. Inf. Model..