Detecting Anomaly and Replacement Prediction for Rainfall Open Data in Thailand

The rainfall data set usually contains missing values due to easily broken sensors. In Thailand, many public agencies collect rainfall values, including National Hydro Informatics (HII), Thai Meteorological Department, etc., since the data are valuable in terms of rainfall prediction, which is important for an agricultural country like Thailand. The rainfall data is normally collected hourly, and because there are many sensor locations, it is hard to maintain these sensors. The sensor data can be lost transiently and/or may yield anomaly values. Since there is a lot of data flowing to the server every day, it is hard to inspect manually or even semi-manually. This project collaborates with HII to develop a system that automates the rainfall data quality improvement process. The machine learning algorithms are used as tools for data cleansing. The derived data can be exposed as an open data set for many developers to explore new innovations. We explore data set characteristics and adopt both statistical and machine learning methods. The results show that the approach used both statistical and machine learning resulting in higher accuracy than using only statistical or machine learning approaches. We also develop a web application to visualize rainfall data results after cleansing and be connected to the models for the automatic cleansing pipelines.