论文信息 - FALKON: Large-Scale Content-Based Video Retrieval Utilizing Deep-Features and Distributed In-memory Computing

FALKON: Large-Scale Content-Based Video Retrieval Utilizing Deep-Features and Distributed In-memory Computing

In the boom of the big-data era, social media, 5g, High-speed Internet, and High-Tech Vision equipment have resulted in an enormous amount of video data production at an alarming speed. A fast and efficient content-based video retrieval system is not only desirable but essential in many domains with widespread applications. Conventional video retrieval techniques are inadequate to fulfill the needs of the day and keep pace with the rate of video production due to three core challenges: the sheer volume of the videos being produced, the complexity of video data, and redundancy in the video data. Due to these challenges, video operations & processing demand an enormous computing power and resources to process effectively. In this paper, we propose FALKON, a content-based video retrieval system harnessing the power of big-data technologies, deep-learning and distributed in-memory computation. First, we perform structural analysis on the videos, then spatial and temporal features are computed and indexed. We apply various optimization techniques to accelerate video processing as well as accuracy. We introduce VidRDD as a basic unit for distributed in-memory video computation. Furthermore, we introduce Video Query Maps as a relevance feedback mechanism to make the proposed system more reliable, user-friendly, and to improve the retrieval results. We implement FALKON on Hadoop, Hbase, Spark, and OpenCV. We achieve an average accuracy of 97.3%. Our evaluation results show that FALKON performs very well in terms of efficiency, scalability, computation time, and precision.

[1] Cordelia Schmid,et al. Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[2] Bernd Freisleben,et al. Deep learning for content-based video retrieval in film and television production , 2017, Multimedia Tools and Applications.

[3] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4] Paul Over,et al. Video shot boundary detection: Seven years of TRECVid activity , 2010, Comput. Vis. Image Underst..

[5] Andrew Zisserman,et al. “Who are you?” - Learning person specific classifiers from video , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[7] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Antonio Torralba,et al. Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[10] Shuai Yang,et al. Efficient large scale near-duplicate video detection base on spark , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[11] Bernd Girod,et al. Temporal aggregation for large-scale query-by-image video retrieval , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[12] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[13] Aftab Alam,et al. SIAT: A Distributed Video Analytics Framework for Intelligent Video Surveillance , 2019, Symmetry.

[14] Matti Pietikäinen,et al. Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] B. Rosenhahn,et al. Computation strategies for volume local binary patterns applied to action recognition , 2014, 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[16] Yixin Chen,et al. Marlin: Taming the big streaming data in large scale video similarity search , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[17] Jong Wook Kim,et al. RanKloud: Scalable Multimedia Data Processing in Server Clusters , 2011, IEEE MultiMedia.

[18] Carl Eklund,et al. National Institute for Standards and Technology , 2009, Encyclopedia of Biometrics.

[19] Chuan-Kai Yang,et al. Video Object Retrieval by Trajectory and Appearance , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[20] Siddhartha Bhattacharyya,et al. Hybrid soft computing approaches to content based video retrieval: A brief review , 2016, Appl. Soft Comput..

[21] Lars George,et al. HBase - The Definitive Guide: Random Access to Your Planet-Size Data , 2011 .

[22] Shaohui Mei,et al. Video summarization via minimum sparse reconstruction , 2015, Pattern Recognit..

[23] Paul Over,et al. High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .

[24] Hairong Kuang,et al. The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[25] Paul Over,et al. The TREC VIdeo Retrieval Evaluation (TRECVID): A Case Study and Status Report , 2004, RIAO.

[26] Zi Huang,et al. Multiple feature hashing for real-time large scale near-duplicate video retrieval , 2011, ACM Multimedia.

[27] Masato Oguchi,et al. A study of a video analysis framework using Kafka and spark streaming , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[28] Michael J. Franklin,et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[29] Ying Li,et al. SurvSurf: human retrieval on large surveillance video data , 2017, Multimedia Tools and Applications.

[30] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[31] Lei Wang,et al. GPU-based MapReduce for large-scale near-duplicate video retrieval , 2015, Multimedia Tools and Applications.

[32] Fei-Fei Li,et al. Hierarchical semantic indexing for large scale image retrieval , 2011, CVPR 2011.

[33] Filiberto Pla,et al. Latent topics-based relevance feedback for video retrieval , 2016, Pattern Recognit..

[34] Fei Wang,et al. Real-time large scale near-duplicate web video retrieval , 2010, ACM Multimedia.