Spatiotemporal Coherence-Based Annotation Placement for Surveillance Videos

In this paper, we propose a novel annotation placement approach for revealing information about foreground objects in surveillance videos. To arrange positions of annotations, spatiotemporal coherence between annotations and foreground objects is applied. The annotation placement problem is formulated as an optimization problem with respect to spatiotemporal coherence of annotations and foreground objects. The optimization problem is effectively solved using Markov random fields. To the best of our knowledge, this paper is the first work that discusses and solves the annotation placement problem for surveillance videos by considering the relationships between annotations and foreground objects with trajectories. As shown in the experiments, the proposed approach can arrange annotations based on the moving trajectories of foreground objects and prevent the occlusions between different annotations and foreground objects. It also achieves better quantitative and qualitative results compared with state-of-the-art approaches.

[1]  Dieter Schmalstieg,et al.  Dynamic compact visualizations for augmented reality , 2013, 2013 IEEE Virtual Reality (VR).

[2]  Ronald Azuma,et al.  Evaluating label placement for augmented reality view management , 2003, The Second IEEE and ACM International Symposium on Mixed and Augmented Reality, 2003. Proceedings..

[3]  Dieter Schmalstieg,et al.  Image-driven view management for augmented reality browsers , 2012, 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[4]  Tom Drummond,et al.  Real-Time Video Annotations for Augmented Reality , 2005, ISVC.

[5]  Mubarak Shah,et al.  Tracking Multiple Occluding People by Localizing on Multiple Scene Planes , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Chun-Rong Huang,et al.  Binary Descriptor Based Nonparametric Background Modeling for Foreground Extraction by Using Detection Theory , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Mihran Tuceryan,et al.  Automatic determination of text readability over textured backgrounds for augmented reality systems , 2004, Third IEEE and ACM International Symposium on Mixed and Augmented Reality.

[8]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Steven K. Feiner,et al.  View management for virtual and augmented reality , 2001, UIST '01.

[10]  Andrea Cavallaro,et al.  Video-Based Human Behavior Understanding: A Survey , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Tom Drummond,et al.  Faster and Better: A Machine Learning Approach to Corner Detection , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Naokazu Yokoya,et al.  View management of annotations for wearable augmented reality , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[13]  Kiyoshi Kiyokawa,et al.  Analysing the effects of a wide field of view augmented reality display on search performance in divided attention tasks , 2014, 2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[14]  Hongyang Chao,et al.  Annotating and navigating tourist videos , 2010, GIS '10.

[15]  Ian D. Reid,et al.  Stable multi-target tracking in real-time surveillance video , 2011, CVPR 2011.

[16]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[17]  Deb Roy,et al.  An immersive system for browsing and visualizing surveillance video , 2010, ACM Multimedia.

[18]  Kosuke Sato,et al.  View Management of Projected Labels on Nonplanar and Textured Surfaces , 2013, IEEE Transactions on Visualization and Computer Graphics.

[19]  Stefan Roth,et al.  MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking , 2015, ArXiv.

[20]  Jirí Bittner,et al.  Layout-aware optimization for interactive labeling of 3D models , 2010, Comput. Graph..

[21]  Kiyoshi Kiyokawa,et al.  Towards intelligent view management: A study of manual text placement tendencies in mobile environments using video see-through displays , 2013, 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[22]  Naokazu Yokoya,et al.  Annotating user-viewed objects for wearable AR systems , 2005, Fourth IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR'05).

[23]  Deborah Hix,et al.  An empirical user-based study of text drawing styles and outdoor background textures for augmented reality , 2005, IEEE Proceedings. VR 2005. Virtual Reality, 2005..

[24]  Chun-Rong Huang,et al.  Maximum a Posteriori Probability Estimation for Online Surveillance Video Synopsis , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Andreas Butz,et al.  View management for driver assistance in an HMD , 2013, ISMAR.

[26]  Chun-Rong Huang,et al.  Binary invariant cross color descriptor using galaxy sampling , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[27]  Bingbing Ni,et al.  Crowded Scene Analysis: A Survey , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[28]  Richard Szeliski,et al.  A Comparative Study of Energy Minimization Methods for Markov Random Fields with Smoothness-Based Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Stefan Decker,et al.  Integrating Text with Video and 3D Graphics: The Effects of Text Drawing Styles on Text Readability , 2010, CHI.

[30]  Tsutomu Terada,et al.  An information layout method for an optical see-through head mounted display focusing on the viewability , 2008, 2008 7th IEEE/ACM International Symposium on Mixed and Augmented Reality.

[31]  Dieter Schmalstieg,et al.  Hedgehog labeling: View management techniques for external labels in 3D space , 2014, 2014 IEEE Virtual Reality (VR).

[32]  J. Ferryman,et al.  PETS2009: Dataset and challenge , 2009, 2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance.

[33]  Vincent Lepetit,et al.  BRIEF: Computing a Local Binary Descriptor Very Fast , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Sridha Sridharan,et al.  An Efficient and Robust System for Multiperson Event Detection in Real-World Indoor Surveillance Scenes , 2015, IEEE Transactions on Circuits and Systems for Video Technology.