Learning decision tree classifiers from attribute value taxonomies and partially specified data

We consider the problem of learning to classify partially specified instances i.e., instances that are described in terms of attribute values at different levels of precision, using user-supplied attribute value taxonomies (AVT). We formalize the problem of learning from AVT and data and present an AVT-guided decision tree learning algorithm (AVT-DTL) to learn classification rules at multiple levels of abstraction. The proposed approach generalizes existing techniques for dealing with missing values to handle instances with partially missing values. We present experimental results that demonstrate that AVT-DTL is able to effectively learn robust high accuracy classifiers from partially specified examples. Our experiments also demonstrate that the use of AVT-DTL outperforms standard decision tree algorithm (C4.5 and its variants) when applied to data with missing attribute values; and produces substantially more compact decision trees than those obtained by standard approach.

[1]  Marie desJardins,et al.  Using Feature Hierarchies in Bayesian Network Learning , 2000, SARA.

[2]  Jiawei Han,et al.  Exploration of the power of attribute-oriented induction in data mining , 1995, KDD 1995.

[3]  Sally I. McClean,et al.  Learning with Concept Hierarchies in Probabilistic Relational Data Mining , 2002, WAIM.

[4]  Hussein Almuallim,et al.  On Handling Tree-Structured Attributed in Decision Tree Learning , 1995, ICML.

[5]  Vasant Honavar,et al.  Ontology-Driven Induction of Decision Trees at Multiple Levels of Abstraction , 2002, SARA.

[6]  Sally I. McClean,et al.  Aggregation of Imprecise and Uncertain Information in Databases , 2001, IEEE Trans. Knowl. Data Eng..

[7]  Jiawei Han,et al.  Efficient Rule-Based Attribute-Oriented Induction for Data Mining , 2000, Journal of Intelligent Information Systems.

[8]  Arbee L. P. Chen,et al.  Evaluating Aggregate Operations Over Imprecise Data , 1996, IEEE Trans. Knowl. Data Eng..

[9]  LINDA G. DEMICHIEL,et al.  Resolving Database Incompatibility: An Approach to Performing Relational Operations over Mismatched Domains , 1989, IEEE Trans. Knowl. Data Eng..

[10]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[11]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[12]  James A. Hendler,et al.  Ontology-based Induction of High Level Classification Rules , 1997, DMKD.

[13]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[14]  Vasant Dhar,et al.  Abstract-Driven Pattern Discovery in Databases , 1992, IEEE Trans. Knowl. Data Eng..

[15]  Michael J. Pazzani,et al.  Learning Hierarchies from Ambiguous Natural Language Data , 1995, ICML.

[16]  William W. Cohen Learning Trees and Rules with Set-Valued Features , 1996, AAAI/IAAI, Vol. 1.