Efficient Maximum Entropy Training for Statistical Object Recognition

In statistical pattern recognition, we use probabilistic models within the task of assigning observations to one of a set of predefined classes, like e.g. images of handwritten digits to one of the classes ‘0’ to ‘9’. The principle of maximum entropy is a powerful framework that can be used to estimate class posterior probabilities for pattern recognition tasks. It is a conceptually simple and easily extensible model that allows to estimate a large number of free parameters reliably. We show how to apply this framework to object recognition and compare the results to other state-of-the-art approaches in experiments with the well known US Postal Service handwritten digits recognition task. We also introduce a simple but effective heuristic method for speeding up the algorithms used to determine the model parameters.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Bernhard Schölkopf,et al.  Prior Knowledge in Support Vector Kernels , 1997, NIPS.

[3]  Hermann Ney,et al.  Maximum Entropy and Gaussian Models for Image Object Recognition , 2002, DAGM-Symposium.

[4]  Hermann Ney,et al.  Experiments with an extended tangent distance , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[5]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[6]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[7]  Michael E. Tipping The Relevance Vector Machine , 1999, NIPS.

[8]  Yann LeCun,et al.  Transformation Invariance in Pattern Recognition-Tangent Distance and Tangent Propagation , 1996, Neural Networks: Tricks of the Trade.

[9]  Hermann Ney,et al.  Statistical Image Object Recognition using Mixture Densities , 2001, Journal of Mathematical Imaging and Vision.

[10]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..