Online Detection of Action Start via Soft Computing for Smart City

Soft computing is facing a rapid evolution thanks to the development of artificial intelligence especially the deep learning. With video surveillance technologies of soft computing, such as image processing, computer vision, and pattern recognition combined with cloud computing, the construction of smart cities could be maintained and greatly enhanced. In this article, we focus on the online detection of action start task in video understanding and analysis, which is critical to the multimedia security in smart cities. We propose a novel model to tackle this problem and achieves state-of-the-art results on the benchmark THUMOS14 data set.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[4]  Limin Wang,et al.  A Pursuit of Temporal Accuracy in General Activity Detection , 2017, ArXiv.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[8]  Shih-Fu Chang,et al.  CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Luc Van Gool,et al.  Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[10]  Bernard Ghanem,et al.  SST: Single-Stream Temporal Action Proposals , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Tao Mei,et al.  Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Shih-Fu Chang,et al.  Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[14]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Shih-Fu Chang,et al.  Online Detection of Action Start in Untrimmed, Streaming Videos , 2018, ECCV.

[16]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[18]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[19]  Weisong Shi,et al.  Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[20]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jitendra Malik,et al.  SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Yanjiang Yang,et al.  Human-Driven Edge Computing and Communication: Part 1 , 2017, IEEE Commun. Mag..

[23]  Jiannong Cao,et al.  Human-Driven Edge Computing and Communication: Part 2 , 2018, IEEE Commun. Mag..

[24]  Fuchun Sun,et al.  RON: Reverse Connection with Objectness Prior Networks for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Kim-Kwang Raymond Choo,et al.  Challenges of Connecting Edge and Cloud Computing: A Security and Forensic Perspective , 2017, IEEE Cloud Computing.

[26]  Ramakant Nevatia,et al.  RED: Reinforced Encoder-Decoder Networks for Action Anticipation , 2017, BMVC.

[27]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.