We propose a procedure for estimating DBLEARN's potential for knowledge discovery, given a relational database and concept hierarchies. This procedure is most useful for evaluating alternative concept hierarchies for the same database. The DBLEARN knowledge discovery program uses an attribute‐oriented inductive‐inference method to discover potentially significant high‐level relationships in a database. A concept forest, with at most one concept hierarchy for each attribute, defines the possible generalizations that DBLEARN can make for a database. The potential for discovery in a database is estimated by examining the complexity of the corresponding concept forest. Two heuristic measures are defined based on the number, depth, and height of the interior nodes. Higher values for these measures indicate more complex concept forests and arguably more potential for discovery. Experimental results using a variety of concept forests and four commercial databases show that in practice both measures permit quite reliable decisions to be made; thus, the simplest may be most appropriate.
[1]
Jiawei Han,et al.
Knowledge Discovery in Databases: An Attribute-Oriented Approach
,
1992,
VLDB.
[2]
Jiawei Han,et al.
Data-Driven Discovery of Quantitative Rules in Relational Databases
,
1993,
IEEE Trans. Knowl. Data Eng..
[3]
Jiawei Han,et al.
Attribute-Oriented Induction in Relational Databases
,
1991,
Knowledge Discovery in Databases.
[4]
Michael R. Genesereth,et al.
Logical foundations of artificial intelligence
,
1987
.
[5]
Howard J. Hamilton,et al.
A Heuristic for Evaluating Databases for Knowledge Discovery with DBLEARN
,
1993,
RSKD.
[6]
Jiawei Han,et al.
Learning in relational databases: an attribute‐oriented approach
,
1991,
Comput. Intell..
[7]
Usama M. Fayyad,et al.
Knowledge Discovery in Databases: An Overview
,
1997,
ILP.