Mining rules from an incomplete dataset with a high missing rate

The problem of recovering missing values from a dataset has become an important research issue in the field of data mining and machine learning. In this thesis, we introduce an iterative missing-value completion method based on the RAR (Robust Association Rules) support values to extract useful association rules for inferring missing values in an iterative way. It consists of three phases. The first phase uses the association rules to roughly complete the missing values. The second phase iteratively reduces the minimum support to gather more association rules to complete the rest of missing values. The third phase uses the association rules from the completed dataset to correct the missing values that have been filled in. Experimental results show the proposed approaches have good accuracy and data recovery even when the missing-value rate is high.

[1]  Bruno Crémilleux,et al.  Treatment of Missing Values for Association Rules , 1998, PAKDD.

[2]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[3]  M. Woodbury A missing information principle: theory and applications , 1972 .

[4]  Grzegorz Protaziuk,et al.  Discovering Association Rules in Incomplete Transactional Databases , 2007, Trans. Rough Sets.

[5]  Tzung-Pei Hong,et al.  Learning rules from incomplete training examples by rough sets , 2002, Expert Syst. Appl..

[6]  Bruno Crémilleux,et al.  MVC - a preprocessing method to deal with missing values , 1999, Knowl. Based Syst..

[7]  Diane J. Cook,et al.  Approximate Association Rule Mining , 2001, FLAIRS Conference.

[8]  Chin-Chen Chang,et al.  Combined association rules for dealing with missing values , 2007, J. Inf. Sci..

[9]  Marzena Kryszkiewicz,et al.  Probabilistic Approach to Association Rules in Incomplete Databases , 2000, Web-Age Information Management.

[10]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[11]  Chian-Huei Wun,et al.  Using association rules for completing missing data , 2004, Fourth International Conference on Hybrid Intelligent Systems (HIS'04).

[12]  Fenio Annansingh,et al.  Knowledge management issues in knowledge-intensive SMEs , 2006, J. Documentation.

[13]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[14]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[15]  Mu-Yen Chen,et al.  Knowledge management performance evaluation: a decade review from 1995 to 2004 , 2006, J. Inf. Sci..