The Effectiveness of a Probabilistic Principal Component Analysis Model and Expectation Maximisation Algorithm in Treating Missing Daily Rainfall Data

The reliability and accuracy of a risk assessment of extreme hydro-meteorological events are highly dependent on the quality of the historical rainfall time series data. However, missing data in a time series such as this could result in lower quality data. Therefore, this paper proposes a multiple-imputation algorithm for treating missing data without requiring information from adjoining monitoring stations. The proposed imputation algorithms are based on the M -component probabilistic principal component analysis model and an expectation maximisation algorithm ( M PPCA-EM). In order to evaluate the effectiveness of the M PPCA-EM imputation algorithm, six distinct historical daily rainfall time series data were recorded from six monitoring stations. These stations were located at the coastal and inland regions of the East-Coast Economic Region (ECER) Malaysia. The results of analysis show that, when it comes to treating missing historical daily rainfall time series data recorded from coastal monitoring stations, the 2-component probabilistic principal component analysis model and expectation-maximisation algorithm (2PPCA-EM) were found to be superior to the single- and multiple-imputation algorithms proposed in previous studies. On the contrary, the single-imputation algorithms as proposed in previous studies were superior to the M PPCA-EM imputation algorithms when treating missing historical daily rainfall time series data recorded from inland monitoring stations.

[1]  Z. L. Chuan,et al.  Determination of the best single imputation algorithm for missing rainfall data treatment , 2016 .

[2]  Norazan Mohamed Ramli,et al.  Normal ratio in multiple imputation based on bootstrapped sample for rainfall data with missingness , 2017 .

[3]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[4]  Amaryllis Mavragani,et al.  Open Economy, Institutional Quality, and Environmental Performance: A Macroeconomic Approach , 2016 .

[5]  Herbert B. Osborn,et al.  Reciprocal-Distance Estimate of Point Rainfall , 1980 .

[6]  Ramesh S. V. Teegavarapu,et al.  Improved weighting methods, deterministic and stochastic data-driven models for estimation of missing precipitation records , 2005 .

[7]  Edmundas Kazimieras Zavadskas,et al.  VIKOR Technique: A Systematic Review of the State of the Art Literature on Methodologies and Applications , 2016 .

[8]  Jun Matsumoto,et al.  Significant Influences of Global Mean Temperature and ENSO on Extreme Rainfall in Southeast Asia , 2014 .

[9]  Abdul Aziz Jemain,et al.  Revised Spatial Weighting Methods for Estimation of Missing Rainfall Data , 2008 .

[10]  Lingbo Yu,et al.  Probabilistic principal component analysis with expectation maximization (PPCA-EM) facilitates volume classification and estimates the missing data. , 2010, Journal of structural biology.

[11]  W. Y. Tang,et al.  Comparative studies of various missing data treatment methods - Malaysian experience , 1996 .

[12]  N. V. Umamahesh,et al.  Is the covariate based non-stationary rainfall IDF curve capable of encompassing future rainfall changes? , 2016 .

[13]  Tan Lit Ken,et al.  The efficiency of average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling in identifying homogeneous precipitation catchments , 2018 .

[14]  Norazan Mohamed Ramli,et al.  Imputation of Missing Rainfall Data Using Revised Normal Ratio Method , 2017 .

[15]  Azami Zaharim,et al.  Application of the Single Imputation Method to Estimate Missing Wind Speed Data in Malaysia , 2013 .

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  Wali I. Mondal An Analysis Of The Industrial Development Potential Of Malaysia: A Shift-Share Approach , 2011 .

[18]  Evan Weller,et al.  Increased frequency of extreme Indian Ocean Dipole events due to greenhouse warming , 2014, Nature.

[19]  Fredolin Tangang,et al.  Climate change and variability over Malaysia: gaps in science and research information , 2012 .

[20]  J. Salas,et al.  A COMPARATIVE ANALYSIS OF TECHNIQUES FOR SPATIAL INTERPOLATION OF PRECIPITATION , 1985 .

[21]  Tan Lit Ken,et al.  Identifying the Ideal Number Components of the Bayesian Principal Component Analysis Model for Missing Daily Precipitation Data Treatment , 2018, International Journal of Engineering & Technology.