Hyperrelations in version space

Abstract A version space is a set of all hypotheses consistent with a given set of training examples, delimited by the specific boundary and the general boundary. In existing studies [Machine Learning 17(1) (1994) 5; Proc. 5th IJCAI (1977) 305; Artificial Intelligence 18 (1982)] a hypothesis is a conjunction of attribute-value pairs, which is shown to have limited expressive power [Machine Learning, The McGraw-Hill Companies, Inc (1997)]. In a more expressive hypothesis space, e.g., disjunction of conjunction of attribute-value pairs, a general version space becomes uninteresting unless some restriction (inductive bias) is imposed [Machine Learning, The McGraw-Hill Companies, Inc (1997)]. In this paper we investigate version space in a hypothesis space where a hypothesis is a hyperrelation, which is in effect a disjunction of conjunctions of disjunctions of attribute-value pairs. Such a hypothesis space is more expressive than the conjunction of attribute-value pairs and the disjunction of conjunction of attribute-value pairs. However, given a dataset, we focus our attention only on those hypotheses which are consistent with given data and are maximal in the sense that the elements in a hypothesis cannot be merged further. Such a hypothesis is called an E-set for the given data, and the set of all E-sets is the version space which is delimited by the least E-set (specific boundary) and the greatest E-set (general boundary). Based on this version space we propose three classification rules for use in different situations. The first two are based on E-sets, and the third one is based on “degraded” E-sets called weak hypotheses, where the maximality constraint is relaxed. We present an algorithm to calculate E-sets, though it is computationally expensive in the worst case. We also present an efficient algorithm to calculate weak hypotheses. The third rule is evaluated using public datasets, and the results compare well with C5.0 decision tree classifier.

[1]  G. Grätzer General Lattice Theory , 1978 .

[2]  Hung Son Nguyen,et al.  From Optimal Hyperplanes to Optimal Decision Trees , 1998, Fundam. Informaticae.

[3]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[4]  Sebastian Thrun,et al.  The MONK''s Problems-A Performance Comparison of Different Learning Algorithms, CMU-CS-91-197, Sch , 1991 .

[5]  Andrzej Skowron,et al.  Boolean Reasoning for Feature Extraction Problems , 1997, ISMIS.

[6]  Andrzej Skowron,et al.  EXTRACTING LAWS FROM DECISION TABLES: A ROUGH SET APPROACH , 1995, Comput. Intell..

[7]  Ivo Düntsch,et al.  Statistical evaluation of rough set dependency analysis , 1997, Int. J. Hum. Comput. Stud..

[8]  Ivo Düntsch,et al.  Classificatory filtering in decision systems , 2000, Int. J. Approx. Reason..

[9]  Ivo Düntsch,et al.  Simple data filtering in rough set systems , 1998, Int. J. Approx. Reason..

[10]  David A. Bell,et al.  A Lattice Machine Approach to Automated Casebase Design: Marrying Lazy and Eager Learning , 1999, IJCAI.

[11]  Sinh Hoa Nguyen,et al.  Pattern Extraction from Data , 1998, Fundam. Informaticae.

[12]  David Haussler,et al.  Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework , 1988, Artif. Intell..

[13]  David A. Bell,et al.  Data Reduction Based on Hyper Relations , 1998, KDD.

[14]  Andrzej Skowron,et al.  Discovery of Data Patterns with Applications to Decomposition and Classification Problems , 1998 .

[15]  Tom M. Mitchell,et al.  Version Spaces: A Candidate Elimination Approach to Rule Learning , 1977, IJCAI.

[16]  Haym Hirsh,et al.  Generalizing Version Spaces , 1994, Machine Learning.

[17]  Tom Michael Mitchell Version spaces: an approach to concept learning. , 1979 .

[18]  Michèle Sebag,et al.  Delaying the Choice of Bias: A Disjunctive Version Space Approach , 1996, ICML.