论文信息 - Learning Trees and Rules with Set-Valued Features

Learning Trees and Rules with Set-Valued Features

In most learning systems examples are represented as fixed-length "feature vectors", the components of which are either real numbers or nominal values. We propose an extension of the feature-vector representation that allows the value of a feature to be a set of strings; for instance, to represent a small white and black dog with the nominal features size and species and the set-valued feature color, one might use a feature vector with size=small, species=canis-familiaris and color-{white, black}. Since we make no assumptions about the number of possible set elements, this extension of the traditional feature-vector representation is closely connected to Blum's "infinite attribute" representation. We argue that many decision tree and rule learning algorithms can be easily extended to set-valued features. We also show by example that many real-world learning problems can be efficiently and naturally represented with set-valued features; in particular, text categorization problems and problems that arise in propositionalizing first-order representations lend themselves to set-valued features.

William W. Cohen

[1] David D. Lewis,et al. Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[2] David D. Lewis,et al. Representation and Learning in Information Retrieval , 1991 .

[3] Michael J. Pazzani,et al. HYDRA: A Noise-tolerant Relational Concept Learning Algorithm , 1993, IJCAI.

[4] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[5] Michael J. Pazzani,et al. Detecting and correcting errors in rule-based expert systems: an integration of empirical and explanation-based learning , 1991 .

[6] Wray L. Buntine. Generalized Subsumption and Its Applications to Induction and Redundancy , 1986, Artif. Intell..

[7] Katharina Morik,et al. A bootstrapping approach to conceptual clustering , 1989, ICML 1989.

[8] William W. Cohen,et al. Learning the Classic Description Logic: Theoretical and Experimental Results , 1994, KR.

[9] Michael J. Pazzani. A reply to Cohen's book review of Creating a Memory of Causal Relationships , 1993, Machine Learning.

[10] Sholom M. Weiss,et al. Automated learning of decision rules for text categorization , 1994, TOIS.

[11] Alexander Dekhtyar,et al. Information Retrieval , 2018, Lecture Notes in Computer Science.