Theoretical and Practical Considerations of Uncertainty and Complexity in Automated Knowledge Acquisition

Inductive machine learning has become an important approach to automated knowledge acquisition from databases. The disjunctive normal form (DNF), as the common analytic representation of decision trees and decision tables (rules), provides a basis for formal analysis of uncertainty and complexity in inductive learning. A theory for general decision trees is developed based on C. Shannon's (1949) expansion of the discrete DNF, and a probabilistic induction system PIK is further developed for extracting knowledge from real world data. Then we combine formal and practical approaches to study how data characteristics affect the uncertainty and complexity in inductive learning. Three important data characteristics, namely, disjunctiveness, noise and incompleteness, are studied. The combination of leveled pruning, leveled condensing and resampling estimation turns out to be a very powerful method for dealing with highly disjunctive and inadequate data. Finally the PIK system is compared with other recent inductive learning systems on a number of real world domains. >

[1]  Wray L. Buntine Inductive knowledge acquisition and induction methodologies , 1989, Knowl. Based Syst..

[2]  Ching Y. Suen,et al.  Analysis and Design of a Decision Tree Based on Entropy Reduction and Its Application to Large Character Set Recognition , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Barry H. Margolin,et al.  An Analysis of Variance for Categorical Data, II: Small Sample Comparisons with Chi Square and other Competitors , 1974 .

[4]  Larry A. Rendell,et al.  Empirical learning as a function of concept character , 2004, Machine Learning.

[5]  L. A. Goodman,et al.  Measures of association for cross classifications , 1979 .

[6]  Tim Niblett,et al.  Constructing Decision Trees in Noisy Domains , 1987, EWSL.

[7]  Giulia Pagallo,et al.  Learning DNF by Decision Trees , 1989, IJCAI.

[8]  Claude E. Shannon,et al.  The synthesis of two-terminal switching circuits , 1949, Bell Syst. Tech. J..

[9]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[10]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[11]  Paul Compton,et al.  Inductive knowledge acquisition: a case study , 1987 .

[12]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[13]  Bernard M. E. Moret,et al.  Decision Trees and Diagrams , 1982, CSUR.

[14]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[15]  Tharam S. Dillon,et al.  A Statistical-Heuristic Feature Selection Criterion for Decision Tree Induction , 1991, IEEE Trans. Pattern Anal. Mach. Intell..