Missing Values Imputation Based on Iterative Learning

Databases for machine learning and data mining often have missing values. How to develop effective method for missing values imputation is a crucial important problem in the field of machine learning and data mining. In this paper, several methods for dealing with missing values in incomplete data are reviewed, and a new method for missing values imputation based on iterative learning is proposed. The proposed method is based on a basic assumption: There exist cause-effect connections among condition attribute values, and the missing values can be induced from known values. In the process of missing values imputation, a part of missing values are filled in at first and converted to known values, which are used for the next step of missing values imputation. The iterative learning process will go on until an incomplete data is entirely converted to a complete data. The paper also presents an example to illustrate the framework of iterative learning for missing values imputation.

[1]  Jerzy W. Grzymala-Busse,et al.  An Experimental Comparison of Three Rough Set Approaches to Missing Attribute Values , 2007, Trans. Rough Sets.

[2]  Yiyu Yao,et al.  Two-Phase Rule Induction from Incomplete Data , 2008, RSKT.

[3]  Peter Clark,et al.  The CN2 induction algorithm , 2004, Machine Learning.

[4]  Yiyu Yao,et al.  Level-wise Construction of Decision Trees for Classification , 2006, Int. J. Softw. Eng. Knowl. Eng..

[5]  Salvatore Greco,et al.  Handling Missing Values in Rough Set Analysis of Multi-Attribute and Multi-Criteria Decision Problems , 1999, RSFDGrC.

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  Marzena Kryszkiewicz,et al.  Rules in Incomplete Information Systems , 1999, Inf. Sci..

[8]  Alexis Tsoukiàs,et al.  On the Extension of Rough Sets under Incomplete Information , 1999, RSFDGrC.

[9]  Yiyu Yao,et al.  Induction of Classification Rules by Granular Computing , 2002, Rough Sets and Current Trends in Computing.

[10]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[11]  Yiyu Yao,et al.  An Analysis of Quantitative Measures Associated with Rules , 1999, PAKDD.

[12]  Jerzy W. Grzymala-Busse,et al.  On the Unknown Attribute Values in Learning from Examples , 1991, ISMIS.

[13]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[14]  Yiyu Yao Concept formation and learning: a cognitive informatics perspective , 2004 .

[15]  Marzena Kryszkiewicz,et al.  Rough Set Approach to Incomplete Information Systems , 1998, Inf. Sci..

[16]  David A. Landgrebe,et al.  The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon , 1994, IEEE Trans. Geosci. Remote. Sens..