Content-Based Retrieval of Functional Objects in Video Using Scene Context

Functional object recognition in video is an emerging problem for visual surveillance and video understanding problem. By functional objects, we mean objects with specific purpose such as postman and delivery truck, which are defined more by their actions and behaviors than by appearance. In this work, we present an approach for content-based learning and recognition of the function of moving objects given video-derived tracks. In particular, we show that semantic behaviors of movers can be captured in location-independent manner by attributing them with features which encode their relations and actions w.r.t. scene contexts. By scene context, we mean local scene regions with different functionalities such as doorways and parking spots which moving objects often interact with. Based on these representations, functional models are learned from examples and novel instances are identified from unseen data afterwards. Furthermore, recognition in the presence of track fragmentation, due to imperfect tracking, is addressed by a boosting-based track linking classifier. Our experimental results highlight both promising and practical aspects of our approach.

[1]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[2]  W. Eric L. Grimson,et al.  Trajectory analysis and semantic region modeling using a nonparametric Bayesian model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Larry S. Davis,et al.  Objects in Action: An Approach for Combining Action Understanding and Object Perception , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Anthony Hoogs,et al.  Unsupervised Learning of Functional Categories in Video Scenes , 2010, ECCV.

[5]  A.G.A. Perera,et al.  Learning Motion Patterns in Surveillance Video using HMM Clustering , 2008, 2008 IEEE Workshop on Motion and video Computing.

[6]  L. Stark,et al.  Dissertation Abstract , 1994, Journal of Cognitive Education and Psychology.

[7]  Ramakant Nevatia,et al.  Learning to associate: HybridBoosted multi-target tracker for crowded scene , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[9]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[10]  A. G. Amitha Perera,et al.  Multi-Object Tracking Through Simultaneous Long Occlusions and Split-Merge Conditions , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Mubarak Shah,et al.  Video Scene Understanding Using Multi-scale Analysis , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Svetha Venkatesh,et al.  Combining image regions and human activity for indirect object recognition in indoor wide-angle views , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[14]  Mubarak Shah,et al.  Multi feature path modeling for video surveillance , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[15]  Anthony Hoogs,et al.  Functional scene element recognition for video scene analysis , 2009, 2009 Workshop on Motion and Video Computing (WMVC).

[16]  Anthony Hoogs,et al.  Detecting rare events in video using semantic primitives with HMM , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..