Estimating Rainfall with Multi-Resource Data over East Asia Based on Machine Learning

The lack of accurate estimation of intense precipitation is a universal limitation in precipitation retrieval. Therefore, a new rainfall retrieval technique based on the Random Forest (RF) algorithm is presented using the Advanced Himawari Imager-8 (Himawari-8/AHI) infrared spectrum data and the NCEP operational Global Forecast System (GFS) forecast information. And the gauge-calibrated rainfall estimates from the Global Precipitation Measurement (GPM) product served as the ground truth to train the model. The two-step RF classification model was established for (1) rain area delineation and (2) precipitation grades’ estimation to improve the accuracy of moderate rain and heavy rain. In view of the imbalance categories’ distribution in the datasets, the resampling technique including the Random Under-sampling algorithm and Synthetic Minority Over-sampling Technique (SMOTE) was implemented throughout the whole training process to fully learn the characteristics among the samples. Among the features used, the contributions of meteorological variables to the trained models were generally greater than those of infrared information; in particular, the contribution of precipitable water was the largest, indicating the sufficient necessity of water vapor conditions in rainfall forecasting. The simulation results by the RF model were compared with the GPM product pixel-by-pixel. To prove the universality of the model, we used independent validation sets which are not used for training and two independent testing sets with different periods from the training set. In addition, the algorithm was validated against independent rain gauge data and compared with GFS model rainfall. Consequently, the RF model identified rainfall areas with a Probability Of Detection (POD) of around 0.77 and a False-Alarm Ratio (FAR) of around 0.23 for validation, as well as a POD of 0.60–0.70 and a FAR of around 0.30 for testing. To estimate precipitation grades, the value of classification was 0.70 in validation and in testing the accuracy was 0.60 despite a certain overestimation. In summary, the performance on the validation and test data indicated the great adaptability and superiority of the RF algorithm in rainfall retrieval in East Asia. To a certain extent, our study provides a meaningful range division and powerful guidance for quantitative precipitation estimation.

[1]  Mariana Belgiu,et al.  Random forest in remote sensing: A review of applications and future directions , 2016 .

[2]  Thomas Nauss,et al.  Discriminating raining from non-raining clouds at mid-latitudes using multispectral satellite data , 2006 .

[3]  Marie-Paule Bonnet,et al.  Comparative Assessments of the Latest GPM Mission's Spatially Enhanced Satellite Rainfall Products over the Main Bolivian Watersheds , 2017, Remote. Sens..

[4]  Barbara Früh,et al.  Verification of precipitation from regional climate simulations and remote-sensing observations with respect to ground-based observations in the upper Danube catchment , 2007 .

[6]  Nitesh V. Chawla,et al.  3 IMBALANCED DATASETS: FROM SAMPLING TO CLASSIFIERS , 2013 .

[7]  Chengguang Lai,et al.  Evaluation of the GPM IMERG satellite-based precipitation products and the hydrological utility , 2017 .

[8]  Lukas W. Lehnert,et al.  Precipitation Retrieval over the Tibetan Plateau from the Geostationary Orbit - Part 1: Precipitation Area Delineation with Elektro-L2 and Insat-3D , 2019, Remote. Sens..

[9]  Rong-hui Huang,et al.  Interdecadal change of summer precipitation over Eastern China around the late-1990s and associated circulation anomalies, internal dynamical causes , 2013 .

[10]  Xi Li,et al.  Evaluation of IMERG and TRMM 3B43 Monthly Precipitation Products over Mainland China , 2016, Remote. Sens..

[11]  S. J. Connor,et al.  Validation of high‐resolution satellite rainfall products over complex terrain , 2008 .

[12]  G. Feng,et al.  Water Vapor Transport Related to the Interdecadal Shift of Summer Precipitation over Northern East Asia in the Late 1990s , 2018, Journal of Meteorological Research.

[13]  Tim Appelhans,et al.  Comparison of four machine learning algorithms for their applicability in satellite-based optical rainfall retrievals , 2015 .

[14]  R. Gairola,et al.  Improved rainfall estimation over the Indian region using satellite infrared technique , 2011 .

[15]  Simon Fong,et al.  An Application of Oversampling, Undersampling, Bagging and Boosting in Handling Imbalanced Datasets , 2013, DaEng.

[16]  Xiaoying Li,et al.  Evaluation of the GPM IMERG V06 products for light rain over Mainland China , 2021 .

[17]  Yan Huang,et al.  A comprehensive drought monitoring method integrating MODIS and TRMM data , 2013, Int. J. Appl. Earth Obs. Geoinformation.

[18]  Haralambos Feidas,et al.  Classifying convective and stratiform rain using multispectral infrared Meteosat Second Generation satellite data , 2011, Theoretical and Applied Climatology.

[19]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[20]  G. Feng,et al.  Atmospheric Circulation Patterns over East Asia and Their Connection with Summer Precipitation and Surface Air Temperature in Eastern China during 1961–2013 , 2018, Journal of Meteorological Research.

[21]  Bartosz Krawczyk,et al.  Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets , 2016, Pattern Recognit..

[22]  Nazli Turini,et al.  Estimating High Spatio-Temporal Resolution Rainfall from MSG1 and GPM IMERG Based on Machine Learning: Case Study of Iran , 2019, Remote. Sens..

[23]  Munehisa K. Yamamoto,et al.  High Temporal Rainfall Estimations from Himawari-8 Multiband Observations Using the Random-Forest Machine-Learning Method , 2019, Journal of the Meteorological Society of Japan. Ser. II.

[24]  Jian Peng,et al.  A Comprehensive Evaluation of Latest GPM IMERG V06 Early, Late and Final Precipitation Products across China , 2021, Remote. Sens..

[25]  Gottfried Kirchengast,et al.  Evaluation of GPM IMERG Early, Late, and Final rainfall estimates using WegenerNet gauge data in southeastern Austria , 2017 .

[26]  Saurabh Das,et al.  A random forest algorithm for nowcasting of intense precipitation events , 2017 .

[27]  Robert J. Joyce,et al.  The estimation of global monthly mean rainfall using infrared satellite data: The GOES precipitation index (GPI) , 1994 .

[28]  Jinliang Wang,et al.  Drought monitoring in Yunnan Province based on a TRMM precipitation product , 2020, Natural Hazards.

[29]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[30]  Jalal Poorolajal,et al.  A comparative study of support vector machines and artificial neural networks for predicting precipitation in Iran , 2014, Theoretical and Applied Climatology.

[31]  Pietro Ceccato,et al.  Comparison of CMORPH and TRMM-3B42 over Mountainous Regions of Africa and South America , 2010 .

[32]  B. Thies,et al.  Random forest-based rainfall retrieval for Ecuador using GOES-16 and IMERG-V06 data , 2021, European Journal of Remote Sensing.

[33]  C. Balaji,et al.  On the Possibility of Retrieving Near-Surface Rain Rate from the Microwave Sounder Saphir of the Megha-Tropiques Mission , 2016 .

[34]  V. Levizzani,et al.  On the statistical relationship between cloud optical and microphysical characteristics and rainfall intensity for convective storms over the Mediterranean. , 2009 .

[35]  Steven P. Neeck,et al.  Global Precipitation Measurement (GPM) implementation , 2010, Remote Sensing.