A methodology for treating missing data applied to daily rainfall data in the Candelaro River Basin (Italy)

Environmental time series are often affected by the “presence” of missing data, but when dealing statistically with data, the need to fill in the gaps estimating the missing values must be considered. At present, a large number of statistical techniques are available to achieve this objective; they range from very simple methods, such as using the sample mean, to very sophisticated ones, such as multiple imputation. A brand new methodology for missing data estimation is proposed, which tries to merge the obvious advantages of the simplest techniques (e.g. their vocation to be easily implemented) with the strength of the newest techniques. The proposed method consists in the application of two consecutive stages: once it has been ascertained that a specific monitoring station is affected by missing data, the “most similar” monitoring stations are identified among neighbouring stations on the basis of a suitable similarity coefficient; in the second stage, a regressive method is applied in order to estimate the missing data. In this paper, four different regressive methods are applied and compared, in order to determine which is the most reliable for filling in the gaps, using rainfall data series measured in the Candelaro River Basin located in South Italy.

[1]  Jeffrey M. Wooldridge,et al.  Introductory Econometrics: A Modern Approach , 1999 .

[2]  C. Prudhomme,et al.  Relationships between extreme daily precipitation and topography in a mountainous region: a case study in Scotland , 1998 .

[3]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[4]  Michael H. Kutner Applied Linear Statistical Models , 1974 .

[5]  W. J. Conover,et al.  Practical Nonparametric Statistics , 1972 .

[6]  Deliang L. Chen,et al.  The influence of wind and topography on precipitation distribution in Sweden: statistical analysis and modelling , 2003 .

[7]  Judi Scheffer,et al.  Dealing with Missing Data , 2020, The Big R‐Book.

[8]  R. A. Groeneveld,et al.  Practical Nonparametric Statistics (2nd ed). , 1981 .

[9]  B. Sevruk,et al.  Correction of precipitation measurements summary report , 1985 .

[10]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[11]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[12]  Kenneth G. Hubbard,et al.  Spatial variability of daily weather variables in the high plains of the USA , 1994 .

[13]  H. Theil A Rank-Invariant Method of Linear and Polynomial Regression Analysis , 1992 .

[14]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[15]  Henrik Madsen,et al.  Uncertainty Estimation In Groundwater Modelling Using Kalman Filtering , 2002 .

[16]  F. Rubel,et al.  Correction of Daily Rain Gauge Measurements in the Baltic Sea Drainage Basin , 1999 .

[17]  B. Sevruk,et al.  Empirical and theoretical assessment of the wind induced error of rain measurement , 1998 .

[18]  V. Yevjevich Probability and statistics in hydrology , 1972 .