Unsupervised model selection for view-invariant object detection in surveillance environments

We propose a novel approach for view-invariant vehicle detection in traffic surveillance videos. Instead of building a monolithic object detector that can model all possible viewpoints, we learn a large array of efficient view-specific models corresponding to different camera views (source domains). When presented with an unseen viewpoint (target domain), closely related models in the source domain are selected for detection based on a novel discriminatively trained distance metric function, which takes into account scene geometry, vehicle motion patterns, and the generalizing ability of the models. Extensive experimental evaluation on a challenging test set, consisting of images collected from fifty different surveillance cameras, demonstrates that our unsupervised approach can outperform complex methods that utilize labeled training data from the target domain, both in terms of speed as well as accuracy.

[1]  Luc Van Gool,et al.  Towards Multi-View Object Class Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[3]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[4]  Xiaofeng Ren,et al.  Discriminative Mixture-of-Templates for Viewpoint Classification , 2010, ECCV.

[5]  Mubarak Shah,et al.  Video Scene Understanding Using Multi-scale Analysis , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Pushmeet Kohli,et al.  Geometric Image Parsing in Man-Made Environments , 2010, ECCV.

[8]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, CVPR 2004.

[9]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Silvio Savarese,et al.  Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories , 2009, 2009 IEEE 12th International Conference on Computer Vision.