MVC - a preprocessing method to deal with missing values

Abstract Many of analysis tasks have to deal with missing values and have developed specific and internal treatments to guess them. In this paper we present an external method, MVC (Missing Values Completion), to improve performances of completion and also declarativity and interactions with the user for this problem. Such qualities will allow to use it for the data cleaning step of the Knowledge Discovery in Databases (KDD) process (U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, From data mining to knowledge discovery: an overview, in: Advances in Knowledge Discovery and Data Mining, MIT Press, Cambridge, MA, USA, 1996, pp. 1–36). The core of MVC, is the Robust Association Rules (RAR) algorithm that we have proposed earlier (A. Ragel, B Cremilleux, Treatment of missing values for association rules, in: Proceedings of the Second Pacific–Asia Conference on Knowledge Discovery and Data Mining (PAKDD-98), Melbourne, Australia, Lecture Notes in Artificial Intelligence 1394, Springer, Berlin, 1998, pp. 258–270). This algorithm extends the concept of association rules (R. Agrawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large databases, in: Proceedings of the ACM SIGMOD Conference on Management of Data, Washington, DC, USA, 1993, pp. 207–216) for databases with multiple missing values. It allows MVC to be an efficient preprocessing method: in our experiments with the c4.5 (J.R. Quilan, C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA, USA, 1993) decision tree program, MVC has permitted to divide, up to two, the error rate in classification, independently of a significant gain of declarativity.