A Fast Filtering Mechanism to Improve Efficiency of Large-Scale Video Analytics

Surveillance cameras are ubiquitous around us. Emerging full-feature object-detection models can analyze surveillance videos with high accuracy but consume much computation. Directly applying these models for practical scenarios with large-scale cameras is prohibitively expensive. This, however, is wasteful and unnecessary considering that user-defined anomalies occur rarely among these videos. Therefore, we propose FFS-VA, a multi-stage Fast Filtering Mechanism for Video Analytics, to make video analytics much cost-effective. FFS-VA filters out the frames without the user-defined events by two stream-specialized filters and a cheap full-function model, to reduce the number of frames reaching the full-feature model. FFS-VA presents a global feedback-queue approach to balance the processing speeds of different filters in intra-stream and inter-stream processes. FFS-VA designs a dynamic batch technique to achieve a trade-off between throughput and latency. FFS-VA can also efficiently scale to multiple GPUs. We evaluate FFS-VA against the state-of-the-art YOLOv3 under the same hardware and video workloads. The experimental results show that under a 12.88 percent target-object occurrence rate on two GPUs, FFS-VA can support up to 30 concurrent video streams (15× more than YOLOv3) in the online case, and obtain 10× speedup when offline analyzing a stream, with an accuracy loss of less than 2 percent.

[1]  Ming Yang,et al.  Large-scale image classification: Fast feature extraction and SVM training , 2011, CVPR 2011.

[2]  Ioannis Anagnostopoulos,et al.  License Plate Recognition From Still Images and Video Sequences: A Survey , 2008, IEEE Transactions on Intelligent Transportation Systems.

[3]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[4]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[5]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  A. Çapar,et al.  License Plate Recognition From Still Images and Video Sequences: A Survey , 2008, IEEE Transactions on Intelligent Transportation Systems.

[7]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[8]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[9]  Hongwei Zhang,et al.  A Novel Fire Detection Approach Based on CNN-SVM Using Tensorflow , 2017, ICIC.

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[12]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  William Robson Schwartz,et al.  A scalable and flexible framework for smart video surveillance , 2016, Comput. Vis. Image Underst..

[14]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Lei Huang,et al.  Detection of abnormal traffic video images based on high-dimensional fuzzy geometry , 2017, Automatic Control and Computer Sciences.

[16]  Jong Bae Kim,et al.  Efficient region-based motion segmentation for a video monitoring system , 2003, Pattern Recognit. Lett..

[17]  Daiheng Ni,et al.  Calculation of traffic flow breakdown probability to optimize link throughput , 2010 .

[18]  Hans-Hellmut Nagel,et al.  Initialization of Model-Based Vehicle Tracking in Video Sequences of Inner-City Intersections , 2007, International Journal of Computer Vision.

[19]  Zulin Wang,et al.  Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM , 2017, ECCV.

[20]  M. Iqbal Saripan,et al.  Methods and Challenges in Shot Boundary Detection: A Review , 2018, Entropy.

[21]  Matei Zaharia,et al.  NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale , 2017, Proc. VLDB Endow..

[22]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Sergio A. Velastin,et al.  A Review of Computer Vision Techniques for the Analysis of Urban Traffic , 2011, IEEE Transactions on Intelligent Transportation Systems.

[24]  Hyun Hee Kim,et al.  Semantic video search using tagsonomies , 2010, ASIST.

[25]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[26]  Ming-Hsuan Yang,et al.  Robust Object Tracking with Online Multiple Instance Learning , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Jian Huang,et al.  A Traffic Congestion Estimation Approach from Video Using Time-Spatial Imagery , 2008, 2008 First International Conference on Intelligent Networks and Intelligent Systems.

[28]  Nuno Vasconcelos,et al.  Learning Complexity-Aware Cascades for Deep Pedestrian Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[31]  Sanyuan Zhang,et al.  Vehicles detection in Traffic Flow , 2010, 2010 Sixth International Conference on Natural Computation.

[32]  Wenpeng Yin,et al.  Comparative Study of CNN and RNN for Natural Language Processing , 2017, ArXiv.

[33]  Paramvir Bahl,et al.  Live Video Analytics at Scale with Approximation and Delay-Tolerance , 2017, NSDI.

[34]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Hui Wei,et al.  Efficient graph-based search for object detection , 2017, Inf. Sci..

[36]  Kunfeng Wang,et al.  Video processing techniques for traffic flow monitoring: A survey , 2011, 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[37]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[38]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Christopher Joseph Pal,et al.  Delving Deeper into Convolutional Networks for Learning Video Representations , 2015, ICLR.

[40]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[41]  Steven Verstockt,et al.  Spatio-temporal Video Retrieval by Animated Sketching , 2013, VISAPP.

[42]  Paramvir Bahl,et al.  Focus: Querying Large Video Datasets with Low Latency and Low Cost , 2018, OSDI.