Missing data imputation by utilizing information within incomplete instances

This paper proposes to utilize information within incomplete instances (instances with missing values) when estimating missing values. Accordingly, a simple and efficient nonparametric iterative imputation algorithm, called the NIIA method, is designed for iteratively imputing missing target values. The NIIA method imputes each missing value several times until the algorithm converges. In the first iteration, all the complete instances are used to estimate missing values. The information within incomplete instances is utilized since the second imputation iteration. We conduct some experiments for evaluating the efficiency, and demonstrate: (1) the utilization of information within incomplete instances is of benefit to easily capture the distribution of a dataset; and (2) the NIIA method outperforms the existing methods in accuracy, and this advantage is clearly highlighted when datasets have a high missing ratio.

[1]  Jun Shao,et al.  Jackknife Variance Estimation for Nearest-Neighbor Imputation , 2001 .

[2]  Yang C. Yuan,et al.  Multiple Imputation for Missing Data: Concepts and New Development , 2000 .

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Martti Juhola,et al.  Treatment of missing data values in a neural network based decision support system for acute abdominal pain , 1998, Artif. Intell. Medicine.

[5]  Chengqi Zhang,et al.  Missing Value Imputation Based on Data Clustering , 2008, Trans. Comput. Sci..

[6]  Paola Sebastiani,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. Robust Learning with Missing Data , 2022 .

[7]  Tariq Samad,et al.  Imputation of Missing Data in Industrial Databases , 1999, Applied Intelligence.

[8]  Shichao Zhang,et al.  Shell-neighbor method and its application in missing data imputation , 2011, Applied Intelligence.

[9]  Gustavo E. A. P. A. Batista,et al.  An analysis of four missing data treatment methods for supervised learning , 2003, Appl. Artif. Intell..

[10]  S. S. Wilks Moments and Distributions of Estimates of Population Parameters from Fragmentary Samples , 1932 .

[11]  Zili Zhang,et al.  Missing Value Estimation for Mixed-Attribute Data Sets , 2011, IEEE Transactions on Knowledge and Data Engineering.

[12]  Chengqi Zhang,et al.  POP algorithm: Kernel-based imputation to treat missing values in knowledge discovery from databases , 2009, Expert Syst. Appl..

[13]  Harry Shum,et al.  Principal Component Analysis with Missing Data and Its Application to Polyhedral Object Modeling , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  G. H. Gessert Handling missing data by using stored truth values , 1991, SGMD.

[15]  Long Quan,et al.  Minimal Projective Reconstruction Including Missing Data , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  M. Aldenderfer,et al.  Cluster Analysis. Sage University Paper Series On Quantitative Applications in the Social Sciences 07-044 , 1984 .

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18]  Ronald K. Pearson,et al.  Mining imperfect data - dealing with contamination and incomplete records , 2005 .

[19]  Shichao Zhang,et al.  Parimputation: From Imputation and Null-Imputation to Partially Imputation , 2008, IEEE Intell. Informatics Bull..

[20]  J. N. K. Rao,et al.  Empirical likelihood-based inference under imputation for missing response data , 2002 .

[21]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[22]  Robert P. Goldman,et al.  Imputation of Missing Data Using Machine Learning Techniques , 1996, KDD.

[23]  Xiaofeng Zhu,et al.  NIIA: Nonparametric Iterative Imputation Algorithm , 2008, PRICAI.

[24]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[25]  Miroslaw Pawlak,et al.  Kernel classification rules from missing data , 1993, IEEE Trans. Inf. Theory.

[26]  Rich Caruana,et al.  A Non-Parametric EM-Style Algorithm for Imputing Missing Values , 2001, AISTATS.

[27]  J. Ross Quinlan,et al.  Unknown Attribute Values in Induction , 1989, ML.