Optimizing Queries over Video via Lightweight Keypoint-based Object Detection

Recent advancements in convolutional neural networks based object detection have enabled analyzing the mounting video data with high accuracy. However, inference speed is a major drawback of these video analysis system because of the heavy object detectors. To address the computational and practicability challenges of video analysis, we propose FastQ, a system for efficient querying over video at scale. Given a target video, FastQ can automatically label the category and number of objects for each frame. We introduce a novel lightweight object detector named FDet to improve the efficiency of query system. First, a difference detector filters the frames whose difference is less than the threshold. Second, FDet is employed to efficiently label the remaining frames. To reduce inference time, FDet detects a center keypoint and a pair of corners from the feature map generated by a lightweight backbone to predict the bounding boxes. FDet completely avoid the complicated computation related to anchor boxes. Compared with state-of-the-art real-time detectors, FDet achieves superior performance with 29.1% AP on COCO benchmark at 25.3ms. Experiments show that FastQ achieves 150 times to 300 times speed-ups while maintaining more than 90% accuracy in video queries.

[1]  Paramvir Bahl,et al.  Focus: Querying Large Video Datasets with Low Latency and Low Cost , 2018, OSDI.

[2]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[5]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Shifeng Zhang,et al.  Single-Shot Refinement Neural Network for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[8]  Peter Bailis,et al.  BlazeIt: Fast Exploratory Video Queries using Neural Networks , 2018, ArXiv.

[9]  Yuhao Zhang,et al.  Panorama: A Data System for Unbounded Vocabulary Querying over Video , 2019, Proc. VLDB Endow..

[10]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[11]  Charles X. Ling,et al.  Pelee: A Real-Time Object Detection System on Mobile Devices , 2018, NeurIPS.

[12]  Matei Zaharia,et al.  NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale , 2017, Proc. VLDB Endow..

[13]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Gang Yu,et al.  ThunderNet: Towards Real-Time Generic Object Detection on Mobile Devices , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[16]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[17]  Dan Zecha,et al.  Improving Small Object Proposals for Company Logo Detection , 2017, ICMR.

[18]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[20]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[23]  Qi Tian,et al.  CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Xingyi Zhou,et al.  Bottom-Up Object Detection by Grouping Extreme and Center Points , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).