A framework for unsupervised training of object detectors from unlabeled surveillance video

Object detection is a critical step in automated surveillance. A common approach to building object detectors involves statistical learning techniques using as input large annotated data sets. However, due to inevitable limitations of a typical training data set, this supervised approach is unsuitable for building a generic surveillance system applicable to a wide variety of scenes, objects and camera setups. To make a step towards a more generic object detection solution, we propose in this paper an unsupervised method capable of learning and detecting the dominant object class in a general dynamic scene observed by a static camera. In the first step of the method, a coarse object detector is built to identify candidate dominant objects based on motion segmentation results obtained for the observed scene. Then, clustering and cluster validation are applied to refine the output of the coarse detector and to select a subset of this output that can be used to train a more sophisticated dominant object detector. Finally, we deploy this trained detector to find further instances of the dominant object class in the observed scene. We demonstrate the effectiveness of our method experimentally on four representative video sequences.

[1]  Vinod Nair,et al.  An unsupervised, online learning framework for moving object detection , 2004, CVPR 2004.

[2]  Mubarak Shah,et al.  Online detection and classification of moving objects using progressively improving detectors , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Tony Jebara,et al.  A Kernel Between Sets of Vectors , 2003, ICML.

[4]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[5]  Takeo Kanade,et al.  Object Detection Using the Statistics of Parts , 2004, International Journal of Computer Vision.

[6]  Ramakant Nevatia,et al.  Detection and Tracking of Multiple, Partially Occluded Humans by Bayesian Combination of Edgelet based Part Detectors , 2007, International Journal of Computer Vision.

[7]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[8]  Bo Wu,et al.  Omni-directional face detection based on real AdaBoost , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[9]  Alan Hanjalic,et al.  Towards unsupervised learning for automatic multi-class object detection in surveillance videos , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[12]  Trevor Darrell,et al.  Unsupervised Learning of Categories from Sets of Partially Matching Image Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[15]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[16]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2005, International Journal of Computer Vision.

[17]  Zhenguo Li,et al.  Noise Robust Spectral Clustering , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Fatih Murat Porikli,et al.  Pedestrian Detection via Classification on Riemannian Manifolds , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[21]  Paul A. Viola,et al.  Unsupervised improvement of visual detectors using cotraining , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[22]  Mohan M. Trivedi,et al.  Detecting Moving Shadows: Algorithms and Evaluation , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  James Orwell,et al.  Learning the Semantic Landscape: embedding scene knowledge in object tracking , 2005, Real Time Imaging.

[24]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[25]  Ramakant Nevatia,et al.  Improving Part based Object Detection by Unsupervised, Online Boosting , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Mei-Chen Yeh,et al.  Fast Human Detection Using a Cascade of Histograms of Oriented Gradients , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[27]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[28]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..