Extending Pictorial Structures for Object Recognition

The goal of this paper is to recognize various deformable objects from images. To this end we extend the class of generative probabilistic models known as pictorial structures. This class of models is particularly suited to represent articulated structures, and has previously been used by Felzenszwalb and Huttenlocher for pose estimation of humans. We extend pictorial structures in three ways: (i) likelihoods are included for both the boundary and the enclosed texture of the animal; (ii) a complete graph is modelled (rather than a tree structure); (iii) it is demonstrated that the model can be tted in polynomial time using belief propagation. We show examples for two types of quadrupeds, cows and horses. We achieve excellent recognition performance for cows with an equal error rate of 3% for 500 positive and 5000 negative images.

[1]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[2]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[3]  Pietro Perona,et al.  Towards automatic discovery of object categories , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[4]  Daniel P. Huttenlocher,et al.  Efficient matching of pictorial structures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[5]  Peter Meer,et al.  Edge Detection with Embedded Confidence , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Dan Roth,et al.  Learning a Sparse Representation for Object Detection , 2002, ECCV.

[7]  Shimon Ullman,et al.  Class-Specific, Top-Down Segmentation , 2002, ECCV.

[8]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[9]  David A. Forsyth,et al.  Using temporal coherence to build models of animals , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Paul M. Baggenstoss The PDF projection theorem and the class-specific method , 2003, IEEE Trans. Signal Process..

[11]  Roberto Cipolla,et al.  Likelihood Models For Template Matching using the PDF Projection Theorem , 2004, BMVC.

[12]  Andrew Zisserman,et al.  Learning Layered Pictorial Structures from Video , 2004, ICVGIP.

[13]  Björn Stenger,et al.  Hand Pose Estimation Using Hierarchical Detection , 2004, ECCV Workshop on HCI.

[14]  Bastian Leibe,et al.  Interleaved Object Categorization and Segmentation , 2003, BMVC.

[15]  Cordelia Schmid,et al.  Human Detection Based on a Probabilistic Assembly of Robust Part Detectors , 2004, ECCV.