Action localization and classification in long-distance surveillance

Suspicious human behaviors can be defined by the user, and in long distance imaging it may include bending the body during walking or crawling, in contrast to regular walking for instance. State-of-the-art methods using convolutional neural networks (CNNs) dealt in general with “clean” signals, in which the object of interest is relatively close to the camera, and therefore fairly clear and easily distinguished from the surrounding environment. This makes it easier to capture detailed information regarding the object and its action. However, in relatively long distance imaging (few kilometers and above) additional difficulties occur which affect the performances of these tasks, since the captured videos are likely to be degraded by the atmospheric path that cause blur and spatiotemporal-varying distortions. Both of these degradation types may reduce the ability for action recognition. These effects become more significant for longer imaging distances and smaller sizes of the objects of interest in the image. The images of objects in imaging through long distance are usually relatively small, and hence, the range of actions that can be resolved is more limited, particularly under strong atmospheric effects. In this study, we perform action localization by first applying optical flow unique processing, and also using a variant of SSD (Single Shot MultiBox Detector) to regress and classify detection boxes in each video frame potentially containing an action of interest.

[1]  Yitzhak Yitzhaky,et al.  Detecting and tracking moving objects in long-distance imaging through turbulent medium. , 2014, Applied optics.

[2]  Luc Van Gool,et al.  Fast Optical Flow Using Dense Inverse Search , 2016, ECCV.

[3]  Cordelia Schmid,et al.  Learning to Track for Spatio-Temporal Action Localization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Yitzhak Yitzhaky,et al.  Classification of moving objects in atmospherically degraded video , 2012 .

[6]  Yitzhak Yitzhaky,et al.  Effects of image restoration on acquisition of moving objects from thermal video sequences degraded by the atmosphere , 2006 .

[7]  Cordelia Schmid,et al.  AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[9]  Yitzhak Yitzhaky,et al.  Effects of image restoration on automatic acquisition of moving objects in thermal video sequences degraded by the atmosphere. , 2007, Applied optics.

[10]  Xiaoou Tang,et al.  LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[12]  Guangming Shi,et al.  Feature-fused SSD: fast detection for small objects , 2017, International Conference on Graphic and Image Processing.

[13]  Suman Saha,et al.  Deep Learning for Detecting Multiple Space-Time Action Tubes in Videos , 2016, BMVC.

[14]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[17]  Norman S. Kopeika,et al.  A System Engineering Approach to Imaging , 1998 .

[18]  Suman Saha,et al.  Online Real-Time Multiple Spatiotemporal Action Localisation and Prediction , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Cordelia Schmid,et al.  Multi-region Two-Stream R-CNN for Action Detection , 2016, ECCV.

[20]  Jitendra Malik,et al.  Finding action tubes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).