Comparison of Machine Learning Techniques and Variables for Groundwater Dissolved Organic Nitrogen Prediction in an Urban Area

Abstract Dissolved inorganic nitrogen (DIN) are typically the main focus of nutrient management strategies; however, some studies have found that dissolved organic nitrogen (DON) can be the dominant form of total nitrogen (TN) in several Australian estuaries and catchments. To better understand nitrogen cycling and explore the relationships between measured groundwater DON and environmental factors, thirteen machine learning (ML) techniques were compared in this study. DON was simulated under two scenarios using a range of input variables: 1) detailed nutrient data with landscape and sampling factors, and 2) limited nutrient data with landscape and sampling factors. Most of the tested ML algorithms more accurately predicted DON than when it was estimated from the difference between TN and DIN. Some models show greater adaptability to different modelling conditions, with only a few approaches able to predict with high accuracy using limited input variables (scenario 2). From the models tested, bagged mars, cubist and random forest were selected as optimal. Sample depth, sampling date and specific surface water area were the important non-nutrient input variables for DON prediction, which reveals the significant effect of surface environmental factors and seasonality on groundwater DON.

[1]  O. Barron,et al.  Biogeochemical processes in the groundwater discharge zone of urban streams , 2013, Biogeochemistry.

[2]  S. Hamilton,et al.  Quantifying the production of dissolved organic nitrogen in headwater streams using 15N tracer additions , 2013 .

[3]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[4]  P. Thompson,et al.  Nutrient limitation of phytoplankton in the upper Swan River estuary, Western Australia , 1996 .

[5]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[6]  William W. Hsieh Machine Learning Methods in the Environmental Sciences: Contents , 2009 .

[7]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[8]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[9]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[10]  Mohamed Abdel-Aty,et al.  Using conditional inference forests to identify the factors affecting crash severity on arterial corridors. , 2009, Journal of safety research.

[11]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[12]  J. Turner,et al.  Interaction between shallow groundwater, saline surface water and nutrient discharge in a seasonal estuary: the Swan–Canning system , 2001 .

[13]  Chuntian Cheng,et al.  Using support vector machines for long-term discharge prediction , 2006 .

[14]  David P. Hamilton,et al.  Modelling and mass balance assessments of nutrient retention in a seasonally-flowing estuary (Swan River Estuary, Western Australia) , 2008 .

[15]  M. McClain,et al.  Groundwater nitrogen dynamics at the terrestrial-lotic interface of a small catchment in the Central Amazon basin , 1994 .

[16]  P. Grierson,et al.  The origin and function of dissolved organic matter in agro-urban coastal streams , 2011 .

[17]  A. Peter Nutrient Limitation of Phytoplankton in the Upper Swan River Estuary, Western Australia , 1996 .

[18]  M. Hipsey,et al.  A 3D hydrodynamic-biogeochemical model for assessing artificial oxygenation in a riverine salt-wedge estuary , 2013 .

[19]  I-Fan Chang,et al.  Support vector regression for real-time flood stage forecasting , 2006 .

[20]  Ian H. Witten,et al.  Induction of model trees for predicting continuous classes , 1996 .

[21]  Eddy Campbell,et al.  Sequential data assimilation in fine-resolution models using error-subspace emulators: Theory and preliminary evaluation , 2012 .

[22]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[23]  Avi Ostfeld,et al.  Evolutionary algorithms and other metaheuristics in water resources: Current status, research challenges and future directions , 2014, Environ. Model. Softw..

[24]  Chuntian Cheng,et al.  A comparison of performance of several artificial intelligence methods for forecasting monthly discharge time series , 2009 .

[25]  Pauline F. Grierson,et al.  Bioavailability and composition of dissolved organic carbon and nitrogen in a near coastal catchment of south-western Australia , 2009 .

[26]  K. Lee,et al.  A comparative study of artificial neural networks and support vector machines for predicting groundwater levels in a coastal aquifer , 2011 .

[27]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[28]  I. Valiela,et al.  Groundwater‐transported dissolved organic nitrogen exports from coastal watersheds , 2006 .

[29]  Murugesu Sivapalan,et al.  Modelling the effects of land-use modifications to control nutrient loads from an agricultural catchment in Western Australia , 2005 .