Alternative ways to handle missing values problem: A case study in earthquake dataset

Dataset is a basic foundation that is often used in understanding a problem. It provides information for researchers to get solutions to the problem. In the data retrieval process, some errors may occur and cause the data to be incomplete for any reason. It was a problem in how to recover the missing values in a dataset. The first step is to look at the characteristics of the data. In this paper, we proposed three alternative ways to obtain the missing values of the dataset. In this case, we used the earthquake dataset that has special properties. We then present the results to see the performance of the proposed methods. The results show a good agreement for the missing data. This is a preliminary result of our research related to missing data in the earthquake dataset. This study has some limitations such as if the missing values occur in a large enough data block, the methods need to be improved.

[1]  Jiancang Zhuang,et al.  Data completeness of the Kumamoto earthquake sequence in the JMA catalog and its influence on the estimation of the ETAS parameters , 2017, Earth, Planets and Space.

[2]  Zhiqiang Ge,et al.  Review and big data perspectives on robust data mining approaches for industrial process modeling with outliers and missing data , 2018, Annu. Rev. Control..

[3]  Ali Ridho Barakbah,et al.  Neural Network for Earthquake Prediction Based on Automatic Clustering in Indonesia , 2018 .

[4]  J. Qiu,et al.  Coronal Holes and Open Magnetic Flux over Cycles 23 and 24 , 2016, Solar physics.

[5]  U. Lohmann,et al.  Background Free‐Tropospheric Ice Nucleating Particle Concentrations at Mixed‐Phase Cloud Conditions , 2018, Journal of Geophysical Research: Atmospheres.

[6]  R. Hilst,et al.  Earthquake Depth Phase Extraction With P Wave Autocorrelation Provides Insight Into Mechanisms of Intermediate‐Depth Earthquakes , 2019, Geophysical Research Letters.

[7]  Michaël Gharbi,et al.  Convolutional neural network for earthquake detection and location , 2017, Science Advances.

[8]  Quansheng Ge,et al.  East Asian warm season temperature variations over the past two millennia , 2018, Scientific reports.

[9]  H. Abhimanyu,et al.  Study of Debris Generated by the Earthquake with Special Reference to Gurkha Earthquake 2015 in Nepal , 2019 .

[10]  Per Winkel,et al.  When and how should multiple imputation be used for handling missing data in randomised clinical trials – a practical guide with flowcharts , 2017, BMC Medical Research Methodology.

[11]  Peng Li,et al.  Best (but oft-forgotten) practices: missing data methods in randomized controlled nutrition trials. , 2019, The American journal of clinical nutrition.

[12]  Craig K Enders,et al.  Multiple imputation as a flexible tool for missing data handling in clinical research. , 2017, Behaviour research and therapy.

[13]  C. Thurber,et al.  Relocated aftershocks and background seismicity in eastern Indonesia shed light on the 2018 Lombok and Palu earthquake sequences , 2020 .

[14]  Pedro Abreu,et al.  Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values , 2015, Comput. Biol. Medicine.

[15]  ScienceDirect Behaviour research and therapy , 1963 .

[16]  Jonathan A C Sterne,et al.  Accounting for missing data in statistical analyses: multiple imputation is not always the answer , 2019, International journal of epidemiology.

[17]  Laura Scognamiglio,et al.  Slip heterogeneity and directivity of the ML 6.0, 2016, Amatrice earthquake estimated with rapid finite‐fault inversion , 2016 .

[18]  Igi Ardiyanto,et al.  A review of missing values handling methods on time-series data , 2016, 2016 International Conference on Information Technology Systems and Innovation (ICITSI).

[19]  Huadong Guo,et al.  Observation scope and spatial coverage analysis for earth observation from a Moon-based platform , 2018 .

[20]  D. Boomsma,et al.  A New Approach to Handle Missing Covariate Data in Twin Research , 2015, Behavior genetics.

[21]  Jong Hae Kim,et al.  Statistical data preparation: management of missing values and outliers , 2017, Korean journal of anesthesiology.

[22]  Mamoru Kato On the Apparently Inappropriate Use of Multiple Hypothesis Testing in Earthquake Prediction Studies , 2019, Seismological Research Letters.

[23]  Bochen Zhang,et al.  Characteristics of the Seismogenic Faults in the 2018 Lombok, Indonesia, Earthquake Sequence as Revealed by Inversion of InSAR Measurements , 2020 .

[24]  Kevin A. Hallgren,et al.  Missing Data in Alcohol Clinical Trials with Binary Outcomes. , 2016, Alcoholism, clinical and experimental research.

[25]  T. Guilderson,et al.  Linked changes in marine dissolved organic carbon molecular size and radiocarbon age , 2016 .

[26]  Uwe Aickelin,et al.  Imputation techniques on missing values in breast cancer treatment and fertility data , 2019, Health Information Science and Systems.

[27]  James C. Bezdek,et al.  Fuzzy c-means clustering of incomplete data , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[28]  W. Ellsworth,et al.  Scaling relation between earthquake magnitude and the departure time from P wave similar growth , 2016 .

[29]  Jayanthi Ranjan,et al.  A Comparison of Multiple Imputation Methods for Data with Missing Values , 2017 .

[30]  Xin Tong,et al.  Evaluation of supplemental samples in longitudinal research with non-normal missing data , 2018, Behavior research methods.

[31]  D. Melgar,et al.  Quick determination of earthquake source parameters from GPS measurements: a study of suitability for Taiwan , 2019, Geophysical Journal International.

[32]  Lourens J. Waldorp,et al.  A note on large-scale logistic prediction: using an approximate graphical model to deal with collinearity and missing data , 2017, Behaviormetrika.

[33]  Liang Liu,et al.  The Impact of Missing Data on Species Tree Estimation. , 2016, Molecular biology and evolution.

[34]  Witold Pedrycz,et al.  Fuzzy C-Means clustering of incomplete data based on probabilistic information granules of missing values , 2016, Knowl. Based Syst..

[35]  W. Kongko,et al.  Tsunami evacuation plans for future megathrust earthquakes in Padang, Indonesia considering stochastic earthquake scenarios , 2017 .