Data Decomposition and Decision Rule Joining for Classification of Data with Missing Values

In this paper we present a new approach to handling incomplete information and classifier complexity reduction. We describe a method, called D 3 RJ, that performs data decomposition and decision rule joining to avoid the necessity of reasoning with missing attribute values. In the consequence more complex reasoning process is needed than in the case of known algorithms for induction of decision rules. The original incomplete data table is decomposed into sub-tables without missing values. Next, methods for induction of decision rules are applied to these sets. Finally, an algorithm for decision rule joining is used to obtain the final rule set from partial rule sets. Using D3RJ method it is possible to obtain smaller set of rules and next better classification accuracy than classic decision rule induction methods. We provide an empirical evaluation of the D3RJ method accuracy and model size on data with missing values of natural origin.

[1]  Salvatore Greco,et al.  Rough Set Processing of Vague Information Using Fuzzy Similarity Relations , 2000, Finite Versus Infinite.

[2]  Andrzej Skowron,et al.  Rough-Neural Computing: Techniques for Computing with Words , 2004, Cognitive Technologies.

[3]  Yiyu Yao,et al.  Rough Sets and Current Trends in Computing : second International Conference, RSCTC 2000, Banff, Canada, October 16-19, 2000 : revised papers , 2001 .

[4]  Sadaaki Miyamoto,et al.  Rough Sets and Current Trends in Computing , 2012, Lecture Notes in Computer Science.

[5]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[6]  R. Słowiński Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory , 1992 .

[7]  Uriel Feige A threshold of ln n for approximating set cover (preliminary version) , 1996, STOC '96.

[8]  Lucila Ohno-Machado,et al.  Building manageable rough set classifiers , 1998, AMIA.

[9]  Andrzej Skowron,et al.  New Directions in Rough Sets, Data Mining, and Granular-Soft Computing , 1999, Lecture Notes in Computer Science.

[10]  Daniel Vanderpooten,et al.  A Generalized Definition of Rough Approximations Based on Similarity , 2000, IEEE Trans. Knowl. Data Eng..

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Zbigniew W. Ras,et al.  Methodologies for Intelligent Systems , 1991, Lecture Notes in Computer Science.

[13]  Sholom M. Weiss,et al.  Lightweight Rule Induction , 2000, ICML.

[14]  Carlos Bento,et al.  A Metric for Selection of the Most Promising Rules , 1998, PKDD.

[15]  Tu Bao Ho,et al.  Cluster-Based Algorithms for Dealing with Missing Values , 2002, PAKDD.

[16]  Sinh Hoa Nguyen,et al.  Rough Sets and Association Rule Generation , 1999, Fundam. Informaticae.

[17]  Ivan Bratko,et al.  Experiments in automatic learning of medical diagnostic rules , 1984 .

[18]  Ron Kohavi,et al.  Lazy Decision Trees , 1996, AAAI/IAAI, Vol. 1.

[19]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[20]  Alexis Tsoukiàs,et al.  Valued Tolerance and Decision Rules , 2000, Rough Sets and Current Trends in Computing.

[21]  Marcin S. Szczuka,et al.  A New Version of Rough Set Exploration System , 2002, Rough Sets and Current Trends in Computing.

[22]  J. Ross Quinlan,et al.  Unknown Attribute Values in Induction , 1989, ML.

[23]  Andrzej Skowron,et al.  Rough-Fuzzy Hybridization: A New Trend in Decision Making , 1999 .

[24]  Andrzej Skowron,et al.  Boolean Reasoning for Decision Rules Generation , 1993, ISMIS.

[25]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[26]  Wojciech Ziarko,et al.  Variable Precision Rough Set Model , 1993, J. Comput. Syst. Sci..

[27]  Jerzy W. Grzymala-Busse,et al.  A Comparison of Several Approaches to Missing Attribute Values in Data Mining , 2000, Rough Sets and Current Trends in Computing.

[28]  Alexis Tsoukiàs,et al.  On the Extension of Rough Sets under Incomplete Information , 1999, RSFDGrC.

[29]  Rafal Latkowski,et al.  Application of Data Decomposition to Incomplete Information Systems , 2002, Intelligent Information Systems.

[30]  Jan G. Bazan Discovery of Decision Rules by Matching New Objects Against Data Tables , 1998, Rough Sets and Current Trends in Computing.

[31]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[32]  Shusaku Tsumoto,et al.  Foundations of Intelligent Systems, 15th International Symposium, ISMIS 2005, Saratoga Springs, NY, USA, May 25-28, 2005, Proceedings , 2005, ISMIS.

[33]  László Lovász,et al.  On the ratio of optimal integral and fractional covers , 1975, Discret. Math..

[34]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[35]  Jerzy W. Grzymala-Busse,et al.  A Closest Fit Approach to Missing Attribute VAlues in Preterm Birth Data , 1999, RSFDGrC.

[36]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[37]  Michal Mikolajczyk Reducing Number of Decision Rules by Joining , 2002, Rough Sets and Current Trends in Computing.

[38]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[39]  Andrzej Skowron,et al.  Discovery of Data Patterns with Applications to Decomposition and Classification Problems , 1998 .

[40]  Andrzej Skowron,et al.  A Rough Set Framework for Data Mining of Propositional Default Rules , 1996, ISMIS.

[41]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[42]  Rafal Latkowski On Decomposition for Incomplete Data , 2003, Fundam. Informaticae.

[43]  Andrzej Skowron,et al.  Hyperrelations in version space , 2004, Int. J. Approx. Reason..

[44]  A. Skowron,et al.  Rough sets and current trends in computing : Third International Conference, RSCTC 2002, Malvern, PA, USA, October 14-16, 2002 : proceedings , 2002 .

[45]  Andrzej Skowron,et al.  The Discernibility Matrices and Functions in Information Systems , 1992, Intelligent Decision Support.

[46]  Alexis Tsoukiàs,et al.  Incomplete Information Tables and Rough Classification , 2001, Comput. Intell..

[47]  Jerzy W. Grzymala-Busse,et al.  LERS-A System for Learning from Examples Based on Rough Sets , 1992, Intelligent Decision Support.

[48]  Roman Słowiński,et al.  Intelligent Decision Support , 1992, Theory and Decision Library.

[49]  Andrzej Skowron,et al.  Rough Sets in Knowledge Discovery 2: Applications, Case Studies, and Software Systems , 1998 .

[50]  Jerome H. Friedman,et al.  A Recursive Partitioning Decision Rule for Nonparametric Classification , 1977, IEEE Transactions on Computers.

[51]  Dominik Slezak,et al.  Approximate Reducts and Association Rules - Correspondence and Complexity Results , 1999, RSFDGrC.

[52]  Salvatore Greco,et al.  Handling Missing Values in Rough Set Analysis of Multi-Attribute and Multi-Criteria Decision Problems , 1999, RSFDGrC.

[53]  Rafal Latkowski,et al.  Incomplete Data Decomposition for Classification , 2002, Rough Sets and Current Trends in Computing.

[54]  Rüdiger Wirth,et al.  A New Algorithm for Faster Mining of Generalized Association Rules , 1998, PKDD.