Sign-constrained linear regression for prediction of microbe concentration based on water quality datasets.

This study presents a novel methodology for estimating the concentration of environmental pollutants in water, such as pathogens, based on environmental parameters. The scientific uniqueness of this study is the prevention of excess conformity in the model fitting by applying domain knowledge, which is the accumulated scientific knowledge regarding the correlations between response and explanatory variables. Sign constraints were used to express domain knowledge, and the effect of the sign constraints on the prediction performance using censored datasets was investigated. As a result, we confirmed that sign constraints made prediction more accurate compared to conventional sign-free approaches. The most remarkable technical contribution of this study is the finding that the sign constraints can be incorporated in the estimation of the correlation coefficient in Tobit analysis. We developed effective and numerically stable algorithms for fitting a model to datasets under the sign constraints. This novel algorithm is applicable to a wide variety of the prediction of pollutant contamination level, including the pathogen concentrations in water.

[1]  Terri L. Moore,et al.  Regression Analysis by Example , 2001, Technometrics.

[2]  Yuhong Yang,et al.  Information Theory, Inference, and Learning Algorithms , 2005 .

[3]  Dennis R Helsel,et al.  Fabricating data: how substituting values for nondetects can ruin results, and what can be done about it. , 2006, Chemosphere.

[4]  Daisuke Sano,et al.  Estimation of concentration ratio of indicator to pathogen-related gene in environmental water based on left-censored data. , 2016, Journal of water and health.

[5]  Paul Jeffrey,et al.  Applying the water safety plan to water reuse: towards a conceptual risk management framework , 2015 .

[6]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[7]  T. Amemiya Tobit models: A survey , 1984 .

[8]  Gurumurthy Ramachandran,et al.  Comparison of methods for analyzing left-censored occupational exposure data. , 2014, The Annals of occupational hygiene.

[9]  Gurumurthy Ramachandran,et al.  A Comparison of the β-Substitution Method and a Bayesian Method for Analyzing Left-Censored Data. , 2015, The Annals of occupational hygiene.

[10]  Dennis R Helsel,et al.  Summing nondetects: Incorporating low‐level contaminants in risk assessment , 2009, Integrated environmental assessment and management.

[11]  A. E. Greenberg,et al.  Standard methods for the examination of water and wastewater : supplement to the sixteenth edition , 1988 .

[12]  A. Whittle,et al.  Development of a MEMS-based electrochemical aptasensor for norovirus detection , 2016 .

[13]  M. Wilhelm,et al.  Chemical and microbiological parameters as possible indicators for human enteric viruses in surface water. , 2010, International journal of hygiene and environmental health.

[14]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[15]  S. Chatterjee,et al.  Regression Analysis by Example , 1979 .

[16]  Amy Pruden,et al.  A human exposome framework for guiding risk management and holistic assessment of recycled water quality , 2016 .

[17]  R. Antweiler Evaluation of Statistical Treatments of Left-Censored Environmental Data Using Coincident Uncensored Data Sets. II. Group Comparisons. , 2015, Environmental science & technology.

[18]  K. Cho,et al.  Meteorological effects on the levels of fecal indicator bacteria in an urban stream: a modeling approach. , 2010, Water research.

[19]  J. Rose,et al.  Validity of the Indicator Organism Paradigm for Pathogen Reduction in Reclaimed Water and Public Health Protection , 2005, Applied and Environmental Microbiology.