Ontology-Driven Induction of Decision Trees at Multiple Levels of Abstraction

Most learning algorithms for data-driven induction of pattern classifiers (e.g., the decision tree algorithm), typically represent input patterns at a single level of abstraction - usually in the form of an ordered tuple of attribute values. However, in many applications of inductive learning - e.g., scientific discovery, users often need to explore a data set at multiple levels of abstraction, and from different points of view. Each point of view corresponds to a set of ontological (and representational) commitments regarding the domain of interest. The choice of an ontology induces a set of representatios of the data and a set of transformations of the hypothesis space. This paper formalizes the problem of inductive learning using ontologies and data; describes an ontology-driven decision tree learning algorithm to learn classification rules at multiple levels of abstraction; and presents preliminary results to demonstrate the feasibility of the proposed approach.

[1]  Jaime A. Reinoso-Castillo,et al.  Ontology-driven information extraction and integration from heterogeneous distributed autonomous data sources: A federated query centric approach. , 2002 .

[2]  Jiawei Han,et al.  Generalization and decision tree induction: efficient classification in data mining , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[3]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[4]  John F. Sowa,et al.  Knowledge representation: logical, philosophical, and computational foundations , 2000 .

[5]  Paul P. Wang,et al.  Computational Biology and Genome Informatics , 2003 .

[6]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[7]  James A. Hendler,et al.  Ontology-based Induction of High Level Classification Rules , 1997, DMKD.

[8]  Andreas Buja,et al.  Data mining criteria for tree-based regression and classification , 2001, KDD '01.

[9]  Jiawei Han,et al.  Exploration of the power of attribute-oriented induction in data mining , 1995, KDD 1995.

[10]  Stuart C. Shapiro Review of Knowledge representation: logical, philosophical, and computational foundations by John F. Sowa. Brooks/Cole 2000. , 2001 .

[11]  Vasant Honavar,et al.  Discovering Protein Function Classification Rules from Reduced Alphabet Representations of Protein Sequences , 2002, JCIS.

[12]  Vasant Honavar,et al.  Distributed knowledge networks , 1998, 1998 IEEE Information Technology Conference, Information Environment for the Future (Cat. No.98EX228).

[13]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[14]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[15]  Vasant Honavar,et al.  Ontology-Driven Information Extraction and Knowledge Acquisition from Heterogeneous, Distributed, Autonomous Biological Data Sources , 2001 .

[16]  Hussein Almuallim,et al.  On Handling Tree-Structured Attributed in Decision Tree Learning , 1995, ICML.

[17]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[18]  Jiawei Han,et al.  Dynamic Generation and Refinement of Concept Hierarchies for Knowledge Discovery in Databases , 1994, KDD Workshop.

[19]  Jeff Heflin,et al.  Coping with Changing Ontologies in a Distributed Environment , 1999 .

[20]  Vasant Honavar,et al.  Learning Decision Tree Classifiers When Classes are not Mutually Exclusive , 2002 .

[21]  Adrian Walker,et al.  On Retrieval from a Small Version of a Large Data Base , 1980, VLDB.