Decision tree design using information theory

The theoretical models based on rate-distortion theory and prefix-coding analogies explain previously observed experimental phenomena reported in the literature. An application to edge detection is described where we primarily emphasize the inductive methodology rather than the domain application (image processing) per se. We conclude that inductive learning paradigms based on information-theoretic models are both theoretically well-behaved and useful in practical problems. i Rule induction from large data sets is currently receiving attention in the areas of machine learning and expert systems. Classifier design from labelled training l' samples is a problem which shares many characteristics with the rule induction problem. A recent paper by Bundy, Silver & Plummer (1985) provides a useful discussion of how the two problems relate to each other. The basic premise of many rule induction mechanisms, when the data is probabilistic rather than deterministic, is to induce a hierarchy or decision tree as a representation of the relationships between the attributes (evidence) and the classes (hypotheses). Hence general relationships between classes and attributes are induced or learned by the induction mechanism. Note that the terms attributes and classes occur more often in pattern recognition literature than the terms evidence and hypotheses which tend to be used in the artificial intelligence domain. For the purposes of this paper we adopt the former. When the attribute-class relationships are probabilistic rather than deterministic, the induction mechanisms which work best appear to be those based on statistical

[1]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[2]  David Nitzan,et al.  Development of intelligent robots: Achievements and issues , 1985, IEEE J. Robotics Autom..

[3]  Padhraic Smyth,et al.  Decision tree design from a communication theory standpoint , 1988, IEEE Trans. Inf. Theory.

[4]  Ioannis Pitas,et al.  Edge Detectors Based on Nonlinear Filters , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  I.E. Abdou,et al.  Quantitative design and evaluation of enhancement/thresholding edge detectors , 1979, Proceedings of the IEEE.

[6]  Jerome H. Friedman,et al.  A Recursive Partitioning Decision Rule for Nonparametric Classification , 1977, IEEE Transactions on Computers.

[7]  I. K. Sethi,et al.  Hierarchical Classifier Design Using Mutual Information , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Ching Y. Suen,et al.  Analysis and Design of a Decision Tree Based on Entropy Reduction and Its Application to Large Character Set Recognition , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Donald Michie,et al.  Current developments in expert systems , 1987 .

[10]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[11]  Amiel Feinstein,et al.  Transmission of Information. , 1962 .

[12]  Larry S. Davis,et al.  A visual navigation system for autonomous land vehicles , 1987, IEEE J. Robotics Autom..

[13]  J. Canny Finding Edges and Lines in Images , 1983 .

[14]  Allan P. White,et al.  Predictor: An Alternative Approach to Uncertain Inference in Expert Systems , 1985, IJCAI.

[15]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[16]  Philip A. Chou,et al.  Optimal pruning with applications to tree-structured source coding and modeling , 1989, IEEE Trans. Inf. Theory.

[17]  J. Ross Quinlan,et al.  Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[18]  Padhraic Smyth The application of information theory to problems in decision tree design and rule-based expert systems , 1988 .

[19]  Padhraic Smyth,et al.  Information-Theoretic Rule Induction , 1988, ECAI.

[20]  R. Gray,et al.  Applications of information theory to pattern recognition and the design of decision trees and trellises , 1988 .

[21]  Friedrich M. Wahl,et al.  Digital Image Signal Processing , 1987 .

[22]  D Marr,et al.  Theory of edge detection , 1979, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[23]  George Nagy,et al.  Decision tree design using a probabilistic model , 1984, IEEE Trans. Inf. Theory.

[24]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[25]  Alan Bundy,et al.  An Analytical Comparison of Some Rule-Learning Programs , 1985, Artif. Intell..