Learning Trees and Rules with Set-Valued Features

In most learning systems examples are represented as fixed-length "feature vectors", the components of which are either real numbers or nominal values. We propose an extension of the feature-vector representation that allows the value of a feature to be a set of strings; for instance, to represent a small white and black dog with the nominal features size and species and the set-valued feature color, one might use a feature vector with size=small, species=canis-familiaris and color-{white, black}. Since we make no assumptions about the number of possible set elements, this extension of the traditional feature-vector representation is closely connected to Blum's "infinite attribute" representation. We argue that many decision tree and rule learning algorithms can be easily extended to set-valued features. We also show by example that many real-world learning problems can be efficiently and naturally represented with set-valued features; in particular, text categorization problems and problems that arise in propositionalizing first-order representations lend themselves to set-valued features.

[1]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[2]  David D. Lewis,et al.  Representation and Learning in Information Retrieval , 1991 .

[3]  Michael J. Pazzani,et al.  HYDRA: A Noise-tolerant Relational Concept Learning Algorithm , 1993, IJCAI.

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  Michael J. Pazzani,et al.  Detecting and correcting errors in rule-based expert systems: an integration of empirical and explanation-based learning , 1991 .

[6]  Wray L. Buntine Generalized Subsumption and Its Applications to Induction and Redundancy , 1986, Artif. Intell..

[7]  Katharina Morik,et al.  A bootstrapping approach to conceptual clustering , 1989, ICML 1989.

[8]  William W. Cohen,et al.  Learning the Classic Description Logic: Theoretical and Experimental Results , 1994, KR.

[9]  Michael J. Pazzani A reply to Cohen's book review of Creating a Memory of Causal Relationships , 1993, Machine Learning.

[10]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[11]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[12]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[13]  Jason Catlett,et al.  Megainduction: A Test Flight , 1991, ML.

[14]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[15]  William W. Cohen Text Categorization and Relational Learning , 1995, ICML.

[16]  Saso Dzeroski,et al.  Background Knowledge and Declarative Bias in Inductive Concept Learning , 1992, AII.

[17]  David D. Lewis,et al.  A comparison of two learning algorithms for text categorization , 1994 .

[18]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[19]  David D. Lewis,et al.  Text categorization of low quality images , 1995 .

[20]  Saso Dzeroski,et al.  PAC-learnability of determinate logic programs , 1992, COLT '92.

[21]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[22]  William W. Cohen Pac-Learning Nondeterminate Clauses , 1994, AAAI.

[23]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[24]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[25]  William W. Cohen Fast Eeective Rule Induction , 1995 .

[26]  BiasWilliam W. CohenAT,et al.  Rapid Prototyping of ILP Systems Using Explicit Bias , 1993 .