Selection of an optimal algorithm for outlier detection in GNSS time series
暂无分享,去创建一个
In data mining, outliers can lead to misleading interpretations of statistical results, particularly in deformation monitoring based on fluctuations and disturbances simulated by numerical models for the analysis of deformations. Therefore, outlier filtering cannot be ignored in data standardization. However, it is not likely that a filtering algorithm is efficient for every data pattern. We investigate five outlier filtering algorithms using MATLAB® (Release 2020a): moving average, moving median, quartiles, Grubbs, and generalized extreme Studentized deviation (GESD) to select the optimal algorithms applied for GNSS time series data. This study is conducted on two types of data used for ionosphere disturbance analysis in the region of the Ring of Fire and crustal deformation monitoring in Germany, one showing seasonal time series patterns and the other presenting the trend models. We apply the simple random sampling method that ensures the principles of unbiased surveying techniques. The optimal algorithm selection is based on the sensitivity of outlier detection and the capability of the central tendency measures. The algorithm robustness is also tested by altering random outliers but maintaining the standard distribution of each dataset. Our results show that the moving median algorithm is most sensitive for outlier detection because it is robust statistics and is not affected by anomalies; followed in turn by quartiles, GESD, and Grubbs. The outlier filtering capability of the moving average algorithm is least efficient, with a percentage of outlier detection below 20% compared to the moving median (corresponding 95% probability). In deformation analysis, disturbances on numerical models are often the basis for motion assessment, while these anomalies are smoothed by moving median filtering. Hence, the quartiles algorithm can be considered in this case. Overall, the moving median is best suited to filter outliers for seasonal and trend time series data; in particular, for deformation analysis, the optimal solution is applying the quartiles or extending the threshold factor and the sliding window of the moving median.