Admissible stochastic complexity models for classification problems

In this paper we investigate the application of stochastic complexity theory to classification problems. In particular, we define the notion of admissible models as a function of problem complexity, the number of data pointsN, and prior belief. This allows us to derive general bounds relating classifier complexity with data-dependent parameters such as sample size, class entropy and the optimal Bayes error rate. We discuss the application of these results to a variety of problems, including decision tree classifiers, Markov models for image segmentation, and feedforward multilayer neural network classifiers.

[1]  A. Rosenfeld Image pattern recognition , 1981, Proceedings of the IEEE.

[2]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[3]  C. S. Wallace,et al.  Estimation and Inference by Compact Coding , 1987 .

[4]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[5]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[6]  Benjamin W. Wah,et al.  Principled Constructive Induction , 1989, IJCAI.

[7]  Michael I. Miller,et al.  A Bayesian approach incorporating Rissanen complexity for learning Markov random field texture models , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[8]  Padhraic Smyth On Stochastic Complexity and Admissible Models for Neural Network Classifiers , 1990, NIPS.

[9]  P. Langley,et al.  Computational Models of Scientific Discovery and Theory Formation , 1990 .

[10]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[11]  George Cybenko,et al.  Complexity Theory of Neural Networks and Classification Problems , 1990, EURASIP Workshop.

[12]  Wray L. Buntine,et al.  Bayesian Back-Propagation , 1991, Complex Syst..

[13]  Andrew R. Barron,et al.  Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.

[14]  Wray L. Buntine,et al.  Learning classification trees , 1992 .

[15]  Padhraic Smyth,et al.  Rule-Based Neural Networks for Classification and Probability Estimation , 1992, Neural Computation.

[16]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[17]  Dennis D. Murphy,et al.  Book review: Computational Models of Scientific Discovery and Theory Formation Edited by Jeff Shrager & Pat Langley (Morgan Kaufmann San Mateo, CA, 1990) , 1992, SGAR.

[18]  H. Gish,et al.  Maximum likelihood training of neural networks , 1993 .