Rough Set Strategies to Data with Missing Attribute Values

In this paper we assume that a data set is presented in the form of the incompletely specified decision table, i.e., some attribute values are missing. Our next basic assumption is that some of the missing attribute values are lost (e.g., erased) and some are "do not care" conditions (i.e., they were redundant or not necessary to make a decision or to classify a case). Incompletely specified decision tables are described by characteristic relations, which for completely specified decision tables are reduced to the indiscernibility relation. It is shown how to compute characteristic relations using an idea of block of attribute-value pairs, used in some rule induction algorithms, such as LEM2. Moreover, the set of all characteristic relations for a class of congruent incompletely specified decision tables, defined in the paper, is a lattice. Three definitions of lower and upper approximations are introduced. Finally, it is shown that the presented approach to missing attribute values may be used for other kind of missing attribute values than lost values and "do not care" conditions.

[1]  Churn-Jung Liau,et al.  A generalized decision logic language for granular computing , 2002, 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE'02. Proceedings (Cat. No.02CH37291).

[2]  Philip S. Yu,et al.  Mining long sequential patterns in a noisy environment , 2002, SIGMOD '02.

[3]  Kymie M. C. Tan,et al.  Benchmarking anomaly-based detection systems , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[4]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[5]  Jerzy W. Grzymala-Busse,et al.  Knowledge acquisition under uncertainty — a rough set approach , 1988, J. Intell. Robotic Syst..

[6]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[7]  Bernhard Pfahringer,et al.  Winning the KDD99 classification cup: bagged boosting , 2000, SKDD.

[8]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[9]  Amit P. Sheth,et al.  An overview of workflow management: From process modeling to workflow automation infrastructure , 1995, Distributed and Parallel Databases.

[10]  Jerzy W. Grzymala-Busse,et al.  A Comparison of Several Approaches to Missing Attribute Values in Data Mining , 2000, Rough Sets and Current Trends in Computing.

[11]  Marzena Kryszkiewicz,et al.  Rules in Incomplete Information Systems , 1999, Inf. Sci..

[12]  Yiyu Yao,et al.  Relational Interpretations of Neigborhood Operators and Rough Set Approximation Operators , 1998, Inf. Sci..

[13]  Alexis Tsoukiàs,et al.  On the Extension of Rough Sets under Incomplete Information , 1999, RSFDGrC.

[14]  Sameer Singh,et al.  Novelty detection: a review - part 2: : neural network based approaches , 2003, Signal Process..

[15]  Clarence A. Ellis,et al.  Office Information Systems and Computer Science , 1980, CSUR.

[16]  Lori A. Clarke,et al.  A Formal Model of Program Dependences and Its Implications for Software Testing, Debugging, and Maintenance , 1990, IEEE Trans. Software Eng..

[17]  Marzena Kryszkiewicz,et al.  Rough Set Approach to Incomplete Information Systems , 1998, Inf. Sci..

[18]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[19]  Ivan Shabalin,et al.  The MP13 approach to the KDD'99 classifier learning contest , 2000, SKDD.

[20]  Yiyu Yao,et al.  Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms , 2003, Canadian Conference on AI.

[21]  Yi Xia,et al.  Mining Frequent Itemsets in Uncertain Datasets , 2004 .

[22]  Yiyu Yao,et al.  Two views of the theory of rough sets in finite universes , 1996, Int. J. Approx. Reason..

[23]  Jerzy W. Grzymala-Busse,et al.  On the Unknown Attribute Values in Learning from Examples , 1991, ISMIS.

[24]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[25]  L. Penland,et al.  Use of a cDNA microarray to analyse gene expression patterns in human cancer , 1996, Nature Genetics.

[26]  Qiang Chen,et al.  An anomaly detection technique based on a chi‐square statistic for detecting intrusions into information systems , 2001 .

[27]  Yiyu Yao,et al.  An Analysis of Quantitative Measures Associated with Rules , 1999, PAKDD.

[28]  Alexis Tsoukiàs,et al.  Incomplete Information Tables and Rough Classification , 2001, Comput. Intell..

[29]  Jerzy W. Grzymala-Busse,et al.  LERS-A System for Learning from Examples Based on Rough Sets , 1992, Intelligent Decision Support.

[30]  Roman Słowiński,et al.  Intelligent Decision Support , 1992, Theory and Decision Library.

[31]  Tommi S. Jaakkola,et al.  A new approach to analyzing gene expression time series data , 2002, RECOMB '02.

[32]  Clarence A. Ellis,et al.  Formal and Informal Models of Office Activity , 1983, IFIP Congress.

[33]  Yiyu Yao,et al.  Information-Theoretic Measures for Knowledge Discovery and Data Mining , 2003 .