An Interactive and Understandable Method to Treat Missing Values: Application to a Medical Data Set

Many analysis tasks have to deal with missing values and some of them have developed speciic and internal treatments to guess them. In this paper we present the use of a new method, called MVC (Missing Values Completion), for this question: MVC is based on data preprocessing which gives prominence to understandable associations and gives the user a central part. Such qualities will allow to use it for the data cleaning step of the Knowledge Discovery in Databases process. The eeciency of this method rests on the Robust Association Rules algorithm that we have proposed. This algorithm extends the concept of association rules for databases with multiple missing values. We give some examples of the use of MVC in a real world data set (in medicine), highlighting typical use of this method.

[1]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[2]  J. Ross Quinlan,et al.  Unknown Attribute Values in Induction , 1989, ML.

[3]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[4]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[5]  Robert P. Goldman,et al.  Imputation of Missing Data Using Machine Learning Techniques , 1996, KDD.

[6]  Max Bramer,et al.  Techniques for Dealing with Missing Values in Classification , 1997, IDA.

[7]  Bruno Crémilleux,et al.  Treatment of Missing Values for Association Rules , 1998, PAKDD.

[8]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[9]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[10]  Hannu T. T. Toivonen,et al.  Samplinglarge databases for finding association rules , 1996, VLDB 1996.

[11]  Bruno Crémilleux,et al.  MVC - a preprocessing method to deal with missing values , 1999, Knowl. Based Syst..

[12]  Heikki Mannila,et al.  Verkamo: Fast Discovery of Association Rules , 1996, KDD 1996.

[13]  Kamal Ali,et al.  Partial Classification Using Association Rules , 1997, KDD.

[14]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[15]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[16]  Padhraic Smyth,et al.  An Information Theoretic Approach to Rule Induction from Databases , 1992, IEEE Trans. Knowl. Data Eng..

[17]  Bruno Crémilleux,et al.  A Theoretical Framework for Decision Trees in Uncertain Domains: Application to Medical Data Sets , 1997, AIME.

[18]  Philip S. Yu Review - Mining Association Rules between Sets of Items in Large Databases , 1999, ACM SIGMOD Digit. Rev..