RF-MEP: A novel Random Forest method for merging gridded precipitation products and ground-based measurements

Abstract The accurate representation of spatio-temporal patterns of precipitation is an essential input for numerous environmental applications. However, the estimation of precipitation patterns derived solely from rain gauges is subject to large uncertainties. We present the Random Forest based MErging Procedure (RF-MEP), which combines information from ground-based measurements, state-of-the-art precipitation products, and topography-related features to improve the representation of the spatio-temporal distribution of precipitation, especially in data-scarce regions. RF-MEP is applied over Chile for 2000—2016, using daily measurements from 258 rain gauges for model training and 111 stations for validation. Two merged datasets were computed: RF-MEP3P (based on PERSIANN-CDR, ERA-Interim, and CHIRPSv2) and RF-MEP5P (which additionally includes CMORPHv1 and TRMM 3B42v7). The performances of the two merged products and those used in their computation were compared against MSWEPv2.2, which is a state-of-the-art global merged product. A validation using ground-based measurements was applied at different temporal scales using both continuous and categorical indices of performance. RF-MEP3P and RF-MEP5P outperformed all the precipitation datasets used in their computation, the products derived using other merging techniques, and generally outperformed MSWEPv2.2. The merged P products showed improvements in the linear correlation, bias, and variability of precipitation at different temporal scales, as well as in the probability of detection, the false alarm ratio, the frequency bias, and the critical success index for different precipitation intensities. RF-MEP performed well even when the training dataset was reduced to 10% of the available rain gauges. Our results suggest that RF-MEP could be successfully applied to any other region and to correct other climatological variables, assuming that ground-based data are available. An R package to implement RF-MEP is freely available online at https://github.com/hzambran/RFmerge .

[1]  S. Sorooshian,et al.  PERSIANN-CDR: Daily Precipitation Climate Data Record from Multisatellite Observations for Hydrological and Climate Studies , 2015 .

[2]  Mauricio Zambrano-Bigiarini,et al.  Hydrological evaluation of satellite-based rainfall estimates over the Volta and Baro-Akobo Basin , 2013 .

[3]  M. Saft,et al.  The CAMELS-CL dataset : catchment attributes and meteorology for large sample studies – Chile dataset , 2018 .

[4]  Witold F. Krajewski,et al.  Stochastic interpolation of rainfall data from rain gages and radar using cokriging: 1. Design of experiments , 1990 .

[5]  K. Remm,et al.  Precipitation pattern in the Baltic countries under the influence of large‐scale atmospheric circulation and local landscape factors , 2009 .

[6]  Eric F. Wood,et al.  MSWEP V2 Global 3-Hourly 0.1° Precipitation: Methodology and Quantitative Assessment , 2018, Bulletin of the American Meteorological Society.

[7]  Y. Hong,et al.  Uncertainty analysis of five satellite-based precipitation products and evaluation of three optimally merged multi-algorithm products over the Tibetan Plateau , 2014 .

[8]  J. Michaelsen,et al.  The climate hazards infrared precipitation with stations—a new environmental record for monitoring extremes , 2015, Scientific Data.

[9]  J. Valdes,et al.  Water governance in Chile: Availability, management and climate change , 2014 .

[10]  Faisal Hossain,et al.  How Much Can A Priori Hydrologic Model Predictability Help in Optimal Merging of Satellite Precipitation Products , 2011 .

[11]  S. Sorooshian,et al.  Evaluation of PERSIANN system satellite-based estimates of tropical rainfall , 2000 .

[12]  Stephen T. C. Wong,et al.  Gene Selection and Classification , 2008 .

[13]  Quanxi Shao,et al.  An improved statistical approach to merge satellite rainfall estimates and raingauge data. , 2010 .

[14]  Ahmad Al Bitar,et al.  SMOS soil moisture assimilation for improved hydrologic simulation in the Murray Darling Basin, Australia , 2015 .

[15]  Le Yu,et al.  Detailed dynamic land cover mapping of Chile: Accuracy improvement by integrating multi-temporal data , 2016 .

[16]  G. Tang,et al.  Performance of Optimally Merged Multisatellite Precipitation Products Using the Dynamic Bayesian Model Averaging Scheme Over the Tibetan Plateau , 2017 .

[17]  M. Zambrano-Bigiarini hzambran/hydroGOF: v0.3-10: CITATION change , 2017 .

[18]  S. Sorooshian,et al.  Merging high‐resolution satellite‐based precipitation fields and point‐scale rain gauge measurements—A case study in Chile , 2017 .

[19]  Laurent Delobbe,et al.  Evaluation of radar-gauge merging methods for quantitative precipitation estimates , 2009 .

[20]  Ji Chen,et al.  A new method for estimation of spatially distributed rainfall through merging satellite observations, raingauge records, and terrain digital elevation model data , 2017 .

[21]  R. Moore,et al.  Rainfall and sampling uncertainties: A rain gauge perspective , 2008 .

[22]  Y.‐C. Gao,et al.  Evaluation of high-resolution satellite precipitation products using rain gauge observations over the Tibetan Plateau , 2012 .

[23]  Y. Hong,et al.  The TRMM Multisatellite Precipitation Analysis (TMPA): Quasi-Global, Multiyear, Combined-Sensor Precipitation Estimates at Fine Scales , 2007 .

[24]  H. Kling,et al.  Runoff conditions in the upper Danube basin under an ensemble of climate change scenarios , 2012 .

[25]  Wim Cornelis,et al.  Seasonal Predictability of Daily Rainfall Characteristics in Central Northern Chile for Dry-Land Management , 2010 .

[26]  Christian Massari,et al.  On the performance of satellite precipitation products in riverine flood modeling: a review. , 2018 .

[27]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[28]  A. Dégre,et al.  Geostatistical interpolation of daily rainfall at catchment scale: the use of several variogram models in the Ourthe and Ambleve catchments, Belgium , 2011 .

[29]  C. Ropelewski,et al.  Validation of satellite rainfall products over East Africa's complex topography , 2007 .

[30]  Margaret A. Oliver,et al.  A tutorial guide to geostatistics: Computing and modelling variograms and kriging , 2014 .

[31]  Jaap Schellekens,et al.  MSWEP: 3-hourly 0.25° global gridded precipitation (1979–2015) by merging gauge, satellite, and reanalysis data , 2016 .

[32]  W. Yuan,et al.  Assessment of multiple precipitation products over major river basins of China , 2014, Theoretical and Applied Climatology.

[33]  Yong Luo,et al.  A merging scheme for constructing daily precipitation analyses based on objective bias‐correction and error estimation techniques , 2015 .

[34]  Ehsan Rabiei,et al.  Applying bias correction for merging rain gauge and radar data , 2015 .

[35]  Upmanu Lall,et al.  Climate risk management for water in semi–arid regions , 2014 .

[36]  C. Birkel,et al.  Temporal and spatial evaluation of satellite-based rainfall estimates across the complex topographical and climatic gradients of Chile , 2016 .

[37]  A. Gobiet,et al.  Impacts of uncertainties in European gridded precipitation observations on regional climate analysis , 2016, International journal of climatology : a journal of the Royal Meteorological Society.

[38]  A. Berg,et al.  Present and future Köppen-Geiger climate classification maps at 1-km resolution , 2018, Scientific Data.

[39]  Nitin Muttil,et al.  Optimal design of rain gauge network in the Middle Yarra River catchment, Australia , 2015 .

[40]  Juan Diego Giraldo-Osorio,et al.  Temporal and spatial evaluation of satellite rainfall estimates over different regions in Latin-America , 2017, Atmospheric Research.

[41]  Tim R. McVicar,et al.  Spatially distributing monthly reference evapotranspiration and pan evaporation considering topographic influences , 2007 .

[42]  Y.‐C. Gao,et al.  Evaluation of high-resolution satellite precipitation products using rain gauge observations over the Tibetan Plateau , 2012 .

[43]  Eulogio Pardo-Igúzquiza,et al.  Optimal areal rainfall estimation using raingauges and satellite data , 1999 .

[44]  F. Hirpa,et al.  Evaluation of High-Resolution Satellite Precipitation Products over Very Complex Terrain in Ethiopia , 2010 .

[45]  A. Prasad,et al.  Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction , 2006, Ecosystems.

[46]  M. F. Hutchinson,et al.  Interpolating Mean Rainfall Using Thin Plate Smoothing Splines , 1995, Int. J. Geogr. Inf. Sci..

[47]  K. Yilmaz,et al.  Evaluation of Multiple Satellite-Based Precipitation Products over Complex Topography , 2014 .

[48]  G. Villarini,et al.  Empirically-based modeling of spatial sampling uncertainties associated with rainfall measurements by rain gauges , 2008 .

[49]  Zhuguo Ma,et al.  Comparison of satellite-based evapotranspiration models over terrestrial ecosystems in China , 2014 .

[50]  Marvin N. Wright,et al.  Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables , 2018, PeerJ.

[51]  Daniel Vila,et al.  Combining TRMM and Surface Observations of Precipitation: Technique and Validation over South America , 2010 .

[52]  Yang Hong,et al.  Global intercomparison and regional evaluation of GPM IMERG Version-03, Version-04 and its latest Version-05 precipitation products: Similarity, difference and improvements , 2018, Journal of Hydrology.

[53]  Balaji Rajagopalan,et al.  Kriging and Local Polynomial Methods for Blending Satellite-Derived and Gauge Precipitation Estimates to Support Hydrologic Early Warning Systems , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[54]  T. Behrens,et al.  Spatial modelling with Euclidean distance fields and machine learning , 2018, European Journal of Soil Science.

[55]  Getachew Workineh Gella Statistical evaluation of High Resolution satellite precipitation products in arid and semi‐arid parts of Ethiopia: a note for hydro‐meteorological applications , 2018, Water and Environment Journal.

[56]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[57]  V. Levizzani,et al.  Validation of Satellite-Based Precipitation Products over Sparsely Gauged African River Basins , 2012 .

[58]  Pietro Ceccato,et al.  Challenges of satellite rainfall estimation over mountainous and arid parts of east Africa , 2011 .

[59]  On the use of elevation, altitude, and height in the ecological and climatological literature , 2013, Oecologia.

[60]  Yang Hong,et al.  Comprehensive evaluation of multi-satellite precipitation products with a dense rain gauge network and optimally merging their simulated hydrological flows using the Bayesian model averaging method , 2012 .

[61]  Andrew Jarvis,et al.  Hole-filled SRTM for the globe Version 4 , 2008 .

[62]  Denis Larocque,et al.  Robustness of random forests for regression , 2010 .

[63]  M. Hutchinson,et al.  The development of 1901–2000 historical monthly climate models for Canada and the United States , 2006 .

[64]  Erwan Scornet,et al.  A random forest guided tour , 2015, TEST.

[65]  Ashish Sharma,et al.  Merging gauge and satellite rainfall with specification of associated uncertainty across Australia , 2013 .

[66]  Yudong Tian,et al.  Validation of precipitation retrievals over land from satellite‐based passive microwave sensors , 2014 .

[67]  Wouter Buytaert,et al.  High‐resolution satellite‐gauge merged precipitation climatologies of the Tropical Andes , 2016 .

[68]  Tim R. McVicar,et al.  Global‐scale regionalization of hydrologic model parameters , 2016 .

[69]  J. Thepaut,et al.  The ERA‐Interim reanalysis: configuration and performance of the data assimilation system , 2011 .

[70]  F. Pappenberger,et al.  Global-scale evaluation of 22 precipitation datasets using gauge observations and hydrological modeling , 2017 .

[71]  Hoshin Vijai Gupta,et al.  Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling , 2009 .

[72]  David C. Goodrich,et al.  Spatial interpolation of precipitation in a dense gauge network for monsoon storm events in the southwestern United States , 2008 .

[73]  V. Kousky,et al.  Assessing objective techniques for gauge‐based analyses of global daily precipitation , 2008 .

[74]  Trevor I. Dowling,et al.  REMOVAL OF TREE OFFSETS FROM SRTM AND OTHER DIGITAL SURFACE MODELS , 2012 .

[75]  J. Janowiak,et al.  CMORPH: A Method that Produces Global Precipitation Estimates from Passive Microwave and Infrared Data at High Spatial and Temporal Resolution , 2004 .

[76]  G. Huffman,et al.  The TRMM Multi-Satellite Precipitation Analysis (TMPA) , 2010 .

[77]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[78]  Y. Hong,et al.  Multi-scale evaluation of high-resolution multi-sensor blended global precipitation products over the Yangtze River , 2013 .

[79]  G. Pegram,et al.  Combining radar and rain gauge rainfall estimates using conditional merging , 2005 .

[80]  Mauricio Zambrano-Bigiarini Temporal and spatial evaluation of long-term satellite-based precipitation products across the complex topographical and climatic gradients of Chile , 2018, Asia-Pacific Remote Sensing.

[81]  R. Lin,et al.  Reprocessed, Bias-Corrected CMORPH Global High-Resolution Precipitation Estimates from 1998 , 2017 .

[82]  Pingping Xie,et al.  A conceptual model for constructing high‐resolution gauge‐satellite merged precipitation analyses , 2011 .

[83]  Aldo Montecinos,et al.  Seasonality of the ENSO-Related Rainfall Variability in Central Chile and Associated Circulation Anomalies , 2003 .

[84]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[85]  M. Borga,et al.  On the interpolation of hydrologic variables: formal equivalence of multiquadratic surface fitting and kriging , 1997 .