Estimation of Individual Micro Data from Aggregated Open Data

In this paper, we propose a method of estimating individual micro data from aggregated open data based on semi-supervised learning and conditional probability. Firstly, the proposed method collects aggregated open data and support data, which are related to the individual micro data to be estimated. Then, we perform the locality sensitive hashing (LSH) algorithm to find a subset of the support data that is similar to the aggregated open data and then classify them by using the Ensemble classification model, which is learned by semi-supervised learning. Finally, we use conditional probability to estimate the individual micro data by finding the most suitable record for the probability distribution of the individual micro data among the classification results. To evaluate the performance of the proposed method, we estimated the individual building data where the fire occurred using the aggregated fire open data. According to the experimental results, the micro data estimation performance of the proposed method is 59.41% on average in terms of accuracy.