VRefine: Refining Massive Surveillance Videos for Efficient Store and Fast Analyzing

Ubiquitous cameras continuously produce enormous surveillance videos, largely challenging the capacity of video analytics and storage system. Although such videos are encoded and compressed by codecs to effectively reduce inter-/intra-frame redundancy at pixel level, they still consume massive storage space, thus being deleted periodically to recycle storage. To reduce hardware pressure in both efficient computation and long-term storage, we propose a video refining system, VRefine, merely retaining key contents for the surveillance videos to achieve a high storage efficiency and fast video analytics. VRefine further eliminates potential inter-/intra-frame content redundancy inherent in surveillance videos from the perspective of video analysis. Specifically, VRefine gradually reduces video size in three consecutive stages: removing all B frames and part of P frames (KStore), condensing the remainder frames based on motion vectors (CStore), and extracting object-semantics into a text database (SStore) using existing object detection models. We implement and evaluate VRefine. The experimental results show that compared with the raw surveillance video, VRefine can reduce 42.3%-94.3% storage size and shorten the analyzing time by 46.5%-95.8%, with a slight and controllable reduction in prediction accuracy (3.0%).