Using temporal coherence to build models of animals

We describe a system that can build appearance models of animals automatically from a video sequence of the relevant animal with no explicit supervisory information. The video sequence need not have any form of special background. Animals are modeled as a 2D kinematic chain of rectangular segments, where the number of segments and the topology of the chain are unknown. The system detects possible segments, clusters segments whose appearance is coherent over time, and then builds a spatial model of such segment clusters. The resulting representation of the spatial configuration of the animal in each frame can be seen either as a track - in which case the system described should be viewed as a generalized tracker, that is capable of modeling objects while tracking them - or as the source of an appearance model which can be used to build detectors for the particular animal. This is because knowing a video sequence is temporally coherent - i.e. that a particular animal is present through the sequence - is a strong supervisory signal. The method is shown to be successful as a tracker on video sequences of real scenes showing three different animals. For the same reason it is successful as a tracker, the method results in detectors that can be used to find each animal fairly reliably within the Corel collection of images.

[1]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[2]  Oded Maron,et al.  Learning from Ambiguity , 1998 .

[3]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[4]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[5]  Daniel P. Huttenlocher,et al.  Efficient matching of pictorial structures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[6]  Sergey Ioffe,et al.  Human tracking with mixtures of trees , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[7]  Mitchell Marcus,et al.  Empirical Methods for Exploiting Parallel Texts , 2001 .

[8]  Martin J. Wainwright,et al.  Tree-based reparameterization for approximate inference on loopy graphs , 2001, NIPS.

[9]  Cordelia Schmid,et al.  Constructing models for content-based image retrieval , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[10]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[11]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Michael J. Black,et al.  Implicit Probabilistic Models of Human Motion for Synthesis and Tracking , 2002, ECCV.

[13]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[14]  James M. Coughlan,et al.  Finding Deformable Shapes Using Loopy Belief Propagation , 2002, ECCV.

[15]  Trevor Darrell Tracking with Non-Linear Dynamic Models , 2002 .

[16]  David A. Forsyth,et al.  Finding and tracking people from the bottom up , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..