论文信息 - A Fast Filtering Mechanism to Improve Efficiency of Large-Scale Video Analytics

A Fast Filtering Mechanism to Improve Efficiency of Large-Scale Video Analytics

Surveillance cameras are ubiquitous around us. Emerging full-feature object-detection models can analyze surveillance videos with high accuracy but consume much computation. Directly applying these models for practical scenarios with large-scale cameras is prohibitively expensive. This, however, is wasteful and unnecessary considering that user-defined anomalies occur rarely among these videos. Therefore, we propose FFS-VA, a multi-stage Fast Filtering Mechanism for Video Analytics, to make video analytics much cost-effective. FFS-VA filters out the frames without the user-defined events by two stream-specialized filters and a cheap full-function model, to reduce the number of frames reaching the full-feature model. FFS-VA presents a global feedback-queue approach to balance the processing speeds of different filters in intra-stream and inter-stream processes. FFS-VA designs a dynamic batch technique to achieve a trade-off between throughput and latency. FFS-VA can also efficiently scale to multiple GPUs. We evaluate FFS-VA against the state-of-the-art YOLOv3 under the same hardware and video workloads. The experimental results show that under a 12.88 percent target-object occurrence rate on two GPUs, FFS-VA can support up to 30 concurrent video streams (15× more than YOLOv3) in the online case, and obtain 10× speedup when offline analyzing a stream, with an accuracy loss of less than 2 percent.

[1] Ming Yang,et al. Large-scale image classification: Fast feature extraction and SVM training , 2011, CVPR 2011.

[2] Ioannis Anagnostopoulos,et al. License Plate Recognition From Still Images and Video Sequences: A Survey , 2008, IEEE Transactions on Intelligent Transportation Systems.

[3] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[4] Ali Farhadi,et al. YOLOv3: An Incremental Improvement , 2018, ArXiv.

[5] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] A. Çapar,et al. License Plate Recognition From Still Images and Video Sequences: A Survey , 2008, IEEE Transactions on Intelligent Transportation Systems.

[7] Yi Li,et al. R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[8] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[9] Hongwei Zhang,et al. A Novel Fire Detection Approach Based on CNN-SVM Using Tensorflow , 2017, ICIC.

[10] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[12] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] William Robson Schwartz,et al. A scalable and flexible framework for smart video surveillance , 2016, Comput. Vis. Image Underst..

[14] Gang Hua,et al. A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Lei Huang,et al. Detection of abnormal traffic video images based on high-dimensional fuzzy geometry , 2017, Automatic Control and Computer Sciences.

[16] Jong Bae Kim,et al. Efficient region-based motion segmentation for a video monitoring system , 2003, Pattern Recognit. Lett..

[17] Daiheng Ni,et al. Calculation of traffic flow breakdown probability to optimize link throughput , 2010 .

[18] Hans-Hellmut Nagel,et al. Initialization of Model-Based Vehicle Tracking in Video Sequences of Inner-City Intersections , 2007, International Journal of Computer Vision.

[19] Zulin Wang,et al. Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM , 2017, ECCV.

[20] M. Iqbal Saripan,et al. Methods and Challenges in Shot Boundary Detection: A Review , 2018, Entropy.

[21] Matei Zaharia,et al. NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale , 2017, Proc. VLDB Endow..

[22] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Sergio A. Velastin,et al. A Review of Computer Vision Techniques for the Analysis of Urban Traffic , 2011, IEEE Transactions on Intelligent Transportation Systems.

[24] Hyun Hee Kim,et al. Semantic video search using tagsonomies , 2010, ASIST.

[25] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[26] Ming-Hsuan Yang,et al. Robust Object Tracking with Online Multiple Instance Learning , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Jian Huang,et al. A Traffic Congestion Estimation Approach from Video Using Time-Spatial Imagery , 2008, 2008 First International Conference on Intelligent Networks and Intelligent Systems.

[28] Nuno Vasconcelos,et al. Learning Complexity-Aware Cascades for Deep Pedestrian Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29] Xiaogang Wang,et al. Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[31] Sanyuan Zhang,et al. Vehicles detection in Traffic Flow , 2010, 2010 Sixth International Conference on Natural Computation.

[32] Wenpeng Yin,et al. Comparative Study of CNN and RNN for Natural Language Processing , 2017, ArXiv.

[33] Paramvir Bahl,et al. Live Video Analytics at Scale with Approximation and Delay-Tolerance , 2017, NSDI.

[34] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Hui Wei,et al. Efficient graph-based search for object detection , 2017, Inf. Sci..

[36] Kunfeng Wang,et al. Video processing techniques for traffic flow monitoring: A survey , 2011, 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[37] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[38] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39] Christopher Joseph Pal,et al. Delving Deeper into Convolutional Networks for Learning Video Representations , 2015, ICLR.

[40] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[41] Steven Verstockt,et al. Spatio-temporal Video Retrieval by Animated Sketching , 2013, VISAPP.

[42] Paramvir Bahl,et al. Focus: Querying Large Video Datasets with Low Latency and Low Cost , 2018, OSDI.