Exploring Random Forest Machine Learning and Remote Sensing Data for Streamflow Prediction: An Alternative Approach to a Process-Based Hydrologic Modeling in a Snowmelt-Driven Watershed

Physically based hydrologic models require significant effort and extensive information for development, calibration, and validation. The study explored the use of the random forest regression (RFR), a supervised machine learning (ML) model, as an alternative to the physically based Soil and Water Assessment Tool (SWAT) for predicting streamflow in the Rio Grande Headwaters near Del Norte, a snowmelt-dominated mountainous watershed of the Upper Rio Grande Basin. Remotely sensed data were used for the random forest machine learning analysis (RFML) and RStudio for data processing and synthesizing. The RFML model outperformed the SWAT model in accuracy and demonstrated its capability in predicting streamflow in this region. We implemented a customized approach to the RFR model to assess the model’s performance for three training periods, across 1991–2010, 1996–2010, and 2001–2010; the results indicated that the model’s accuracy improved with longer training periods, implying that the model trained on a more extended period is better able to capture the parameters’ variability and reproduce streamflow data more accurately. The variable importance (i.e., IncNodePurity) measure of the RFML model revealed that the snow depth and the minimum temperature were consistently the top two predictors across all training periods. The paper also evaluated how well the SWAT model performs in reproducing streamflow data of the watershed with a conventional approach. The SWAT model needed more time and data to set up and calibrate, delivering acceptable performance in annual mean streamflow simulation, with satisfactory index of agreement (d), coefficient of determination (R2), and percent bias (PBIAS) values, but monthly simulation warrants further exploration and model adjustments. The study recommends exploring snowmelt runoff hydrologic processes, dust-driven sublimation effects, and more detailed topographic input parameters to update the SWAT snowmelt routine for better monthly flow estimation. The results provide a critical analysis for enhancing streamflow prediction, which is valuable for further research and water resource management, including snowmelt-driven semi-arid regions.

[1]  Yangbo Chen,et al.  A new avenue to improve the performance of integrated modeling for flash flood susceptibility assessment: Applying cluster algorithms , 2023, Ecological Indicators.

[2]  E. Elias,et al.  A Statistical Approach to Using Remote Sensing Data to Discern Streamflow Variable Influence in the Snow Melt Dominated Upper Rio Grande Basin , 2022, Remote. Sens..

[3]  Y. Xuan,et al.  Improvement of the SWAT Model for Snowmelt Runoff Simulation in Seasonal Snowmelt Area Using Remote Sensing Data , 2022, Remote. Sens..

[4]  M. Vainu,et al.  Random forest-based modeling of stream nutrients at national level in a data-scarce region. , 2022, The Science of the total environment.

[5]  R. Leconte,et al.  Short-Term Hydrological Forecast Using Artificial Neural Network Models with Different Combinations and Spatial Representations of Hydrometeorological Inputs , 2022, Water.

[6]  Chung-Soo Kim,et al.  Comparison of the performance of a hydrologic model and a deep learning technique for rainfall runoff analysis , 2022, Tropical Cyclone Research and Review.

[7]  T. Hofmann,et al.  Parameter estimation and uncertainty analysis in hydrological modeling , 2021, WIREs Water.

[8]  J. Casalí,et al.  A comparison of performance of SWAT and machine learning models for predicting sediment load in a forested Basin, Northern Spain , 2021, CATENA.

[9]  Jules Maurice Habumugisha,et al.  Modifications to Snow-Melting and Flooding Processes in the Hydrological Model—A Case Study in Issyk-Kul, Kyrgyzstan , 2021, Atmosphere.

[10]  A. Kastridis,et al.  Investigation of Flood Management and Mitigation Measures in Ungauged NATURA Protected Watersheds , 2021, Hydrology.

[11]  A. Karagiannidis,et al.  Utilization and uncertainties of satellite precipitation data in flash flood hydrological analysis in ungauged watersheds , 2021, Global NEST: the international Journal.

[12]  Yanyan Zhang,et al.  Analysis and prediction of produced water quantity and quality in the Permian Basin using machine learning techniques. , 2021, The Science of the total environment.

[13]  R. Touzi,et al.  Landscape‐scale variations in near‐surface soil temperature and active‐layer thickness: Implications for high‐resolution permafrost mapping , 2021, Permafrost and Periglacial Processes.

[14]  Zhen Li,et al.  Prediction of Snow Depth Based on Multi-Source Data and Machine Learning Algorithms , 2021, 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS.

[15]  C. Steele,et al.  Implications of observed changes in high mountain snow water storage, snowmelt timing and melt window , 2021 .

[16]  José M. Cecilia,et al.  Impacts of swat weather generator statistics from high-resolution datasets on monthly streamflow simulation over Peninsular Spain , 2021 .

[17]  Velimir V. Vesselinov,et al.  Augmenting geophysical interpretation of data-driven operational water supply forecast modeling for a western US river using a hybrid machine learning approach , 2021, Journal of Hydrology.

[18]  M. Bierkens,et al.  The potential of data driven approaches for quantifying hydrological extremes , 2021 .

[19]  B. Schaefli,et al.  Why do we have so many different hydrological models? A review based on the case of Switzerland , 2021, WIREs Water.

[20]  Han Soo Lee,et al.  Hydrological Modelling for Water Resource Management in a Semi-Arid Mountainous Region Using the Soil and Water Assessment Tool: A Case Study in Northern Afghanistan , 2021, Hydrology.

[21]  Y. Demissie,et al.  Review: Sources of Hydrological Model Uncertainties and Advances in Their Analysis , 2020, Water.

[22]  Z. Kundzewicz,et al.  How evaluation of hydrological models influences results of climate impact assessment—an editorial , 2020, Climatic Change.

[23]  M. Disse,et al.  Evaluating the performance of random forest for large-scale flood discharge simulation , 2020, Journal of Hydrology.

[24]  Tasuku Kato,et al.  Evaluation of Different Objective Functions Used in the SUFI-2 Calibration Process of SWAT-CUP on Water Balance Analysis: A Case Study of the Pursat River Basin, Cambodia , 2020, Water.

[25]  Yongqiang Zhang,et al.  Estimating annual runoff in response to forest change: A statistical method based on random forest , 2020 .

[26]  T. Qin,et al.  Influence of Subsoiling on the Effective Precipitation of Farmland Based on a Distributed Hydrological Model , 2020, Water.

[27]  S. S. Zanetti,et al.  Hydrological modelling of tropical watersheds under low data availability , 2020 .

[28]  A. K. Sarma,et al.  Hydrological modeling as a tool for water resources management of the data-scarce Brahmaputra basin , 2020 .

[29]  Dong Liu,et al.  Random forest regression evaluation model of regional flood disaster resilience based on the whale optimization algorithm , 2020 .

[30]  L. Hay,et al.  Runoff sensitivity to snow depletion curve representation within a continental scale hydrologic model , 2020, Hydrological Processes.

[31]  P. Milly,et al.  Colorado River flow dwindles as warming-driven loss of reflective snow energizes evaporation , 2020, Science.

[32]  C. Conesa-García,et al.  Suitability of the SWAT Model for Simulating Water Discharge and Sediment Load in a Karst Watershed of the Semiarid Mediterranean Basin , 2020, Water Resources Management.

[33]  S. Sen,et al.  Comparison and evaluation of gridded precipitation datasets for streamflow simulation in data scarce watersheds of Ethiopia , 2019 .

[34]  Simon Kraatz,et al.  Identifying Subsurface Drainage using Satellite Big Data and Machine Learning via Google Earth Engine , 2019, Water Resources Research.

[35]  A. Oubeidillah,et al.  Incorporating Antecedent Soil Moisture into Streamflow Forecasting , 2019, Hydrology.

[36]  Olivier Hagolle,et al.  Theia Snow collection: high-resolution operational snow cover maps from Sentinel-2 and Landsat-8 data , 2019, Earth System Science Data.

[37]  G. Hewa,et al.  A Comparison of Continuous and Event-Based Rainfall–Runoff (RR) Modelling Using EPA-SWMM , 2019, Water.

[38]  E. Symeonakis,et al.  A Random Forest-Cellular Automata modelling approach to explore future land use/cover change in Attica (Greece), under different socio-economic realities and scales. , 2019, The Science of the total environment.

[39]  A. El‐Kadi,et al.  Assessment of SWAT Model Performance in Simulating Daily Streamflow under Rainfall Data Scarcity in Pacific Island Watersheds , 2018, Water.

[40]  Patrick Willems,et al.  Spatially Distributed Conceptual Hydrological Model Building: A Generic Top‐Down Approach Starting From Lumped Models , 2018, Water Resources Research.

[41]  D. Gutzler,et al.  Observed Changes in Climate and Streamflow in the Upper Rio Grande Basin , 2018 .

[42]  Vijay P. Singh,et al.  Hydrologic modeling: progress and future directions , 2018, Geoscience Letters.

[43]  Avik Bhattacharya,et al.  Snow Cover Mapping Using Polarization Fraction Variation With Temporal RADARSAT-2 C-Band Full-Polarimetric SAR Data Over the Indian Himalayas , 2018, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[44]  J. Pérez-Sánchez,et al.  A Comparison of SWAT and ANN Models for Daily Runoff Simulation in Different Climatic Zones of Peninsular Spain , 2018 .

[45]  Florian Pappenberger,et al.  Mitigating the Impacts of Climate Nonstationarity on Seasonal Streamflow Predictability in the U.S. Southwest , 2017 .

[46]  Karim C. Abbaspour,et al.  A Guideline for Successful Calibration and Uncertainty Analysis for Soil and Water Assessment: A Review of Papers from the 2016 International SWAT Conference , 2017 .

[47]  Young-Oh Kim,et al.  Comparison of hydrological models for the assessment of water resources in a data-scarce region, the Upper Blue Nile River Basin , 2017 .

[48]  Andrew W. Wood,et al.  Assessing recent declines in Upper Rio Grande runoff efficiency from a paleoclimate perspective , 2017 .

[49]  J. Adamowski,et al.  Climate change impacts on surface water resources in arid and semi-arid regions: a case study in northern Jordan , 2017, Acta Geodaetica et Geophysica.

[50]  Jack Chin Pang Cheng,et al.  Identifying the influential features on the regional energy use intensity of residential buildings based on Random Forests , 2016 .

[51]  R. Kokaly,et al.  The effects of dust on Colorado mountain snow cover albedo and compositional links to dust-source areas , 2016 .

[52]  E. Ortiz,et al.  Using post-flood surveys and geomorphologic mapping to evaluate hydrological and hydraulic models: The flash flood of the Girona River (Spain) in 2007 , 2016 .

[53]  Z. Easton,et al.  Evaluating weather observations and the Climate Forecast System Reanalysis as inputs for hydrologic modelling in the tropics , 2016 .

[54]  D. Lawrence,et al.  Terrestrial contribution to the heterogeneity in hydrological changes under global warming , 2016 .

[55]  Sang-Eun Park,et al.  Variations of Microwave Scattering Properties by Seasonal Freeze/Thaw Transition in the Permafrost Active Layer Observed by ALOS PALSAR Polarimetric Data , 2015, Remote. Sens..

[56]  Anil Acharya MODELED HYDROLOGIC RESPONSE UNDER CLIMATE CHANGE IMPACTS OVER THE BANKHEAD NATIONAL FOREST IN NORTHERN ALABAMA , 2015 .

[57]  Khandaker Iftekharul Islam,et al.  A Model of Indicators and GIS Maps for the Assessment of Water Resources , 2015 .

[58]  Khandaker Iftekharul Islam,et al.  Correlation between Atmospheric Temperature and Soil Temperature: A Case Study for Dhaka, Bangladesh , 2015 .

[59]  Dmitri Kavetski,et al.  A unified approach for process‐based hydrologic modeling: 1. Modeling concept , 2015 .

[60]  C. A. Jones,et al.  Effects of spatial and temporal weather data resolutions on streamflow modeling of a semi-arid basin, Northeast Brazil , 2015 .

[61]  Albert Rango,et al.  Assessing climate change impacts on water availability of snowmelt-dominated basins of the Upper Rio Grande basin , 2015 .

[62]  Tammo S. Steenhuis,et al.  Using the Climate Forecast System Reanalysis as weather input data for watershed models , 2014 .

[63]  D. Llewellyn Upper Rio Grande Impact Assessment , 2014 .

[64]  Ajai Singh,et al.  Assessing the performance and uncertainty analysis of the SWAT and RBNN models for simulation of sediment yield in the Nagwa watershed, India , 2014 .

[65]  M. Askar Rainfall-runoff Model Using The SCS-CNMethod And Geographic Information Systems:A Case Study Of Gomal River Watershed , 2013 .

[66]  C. Willmott,et al.  A refined index of model performance , 2012 .

[67]  Thomas H. Painter,et al.  Dust radiative forcing in snow of the Upper Colorado River Basin: 1. A 6 year record of energy balance, radiation, and dust concentrations , 2012 .

[68]  Jeffrey G. Arnold,et al.  Soil and Water Assessment Tool Theoretical Documentation Version 2009 , 2011 .

[69]  B. Debele,et al.  Comparison of Process-Based and Temperature-Index Snowmelt Modeling in SWAT , 2010 .

[70]  Thomas H. Painter,et al.  Mountain hydrology of the western United States , 2006 .

[71]  I. Townshend,et al.  Climate warming impacts on snowpack accumulation in an alpine watershed , 2005 .

[72]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[73]  J. Arnold,et al.  Development of a snowfall-snowmelt routine for mountainous terrain for the soil water assessment tool (SWAT) , 2002 .

[74]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[75]  B. Efron Jackknife‐After‐Bootstrap Standard Errors and Influence Functions , 1992 .

[76]  J. Nash,et al.  River flow forecasting through conceptual models part I — A discussion of principles☆ , 1970 .

[77]  Christine Dewi,et al.  RANDOM FOREST AND SUPPORT VECTOR MACHINE ON FEATURES SELECTION FOR REGRESSION ANALYSIS , 2019 .

[78]  S. Bilewu,et al.  RUNOFF HYDROGRAPHS USING SNYDER AND SCS SYNTHETIC UNIT HYDROGRAPH METHODS: A CASE STUDY OF SELECTED RIVERS IN SOUTH WEST NIGERIA , 2017 .

[79]  G. S. Dwarakish,et al.  A Review on Hydrological Models , 2015 .

[80]  Raghavan Srinivasan,et al.  SWAT: Model Use, Calibration, and Validation , 2012 .

[81]  Raghavan Srinivasan,et al.  Progress toward evaluating the sustainability of switchgrass as a bioenergy crop using the SWAT model. , 2010 .

[82]  Kellie J. Archer,et al.  Empirical characterization of random forest variable importance measures , 2008, Comput. Stat. Data Anal..

[83]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[84]  Jeroen C. J. H. Aerts,et al.  Comparing model performance of the HBV and VIC models in the rhine basin , 2007 .

[85]  L. Breiman Random Forests , 2001, Machine Learning.

[86]  D. Finch Rio Grande ecosystems : linking land, water, and people : toward a sustainable future for the Middle Rio Grande Basin : June 2-5, 1998, Albuquerque, New Mexico , 1999 .

[87]  D. Roark,et al.  Upper Rio Grande water operations model: A tool for enhanced system management , 1999 .