Spatial Keyframe Extraction Of Mobile Videos For Efficient Object Detection At The Edge

Advances in federated learning and edge computing advocate for deep learning models to run at edge devices for video analysis. However, the captured video frame rate is too high to be processed at the edge in real-time with a typical model such as CNN. Any approach to consecutively feed frames to the model compromises both the quality (by missing important frames) and the efficiency (by processing redundantly similar frames) of analysis. Focusing on outdoor urban videos, we utilize the spatial metadata of frames to select an optimal subset of frames that maximizes the coverage area of the footage. The spatial keyframe extraction is formulated as an optimization problem, with the number of selected frames as the restriction and the maximized coverage as the objective. We prove this problem is NP-hard and devise various heuristics to solve it efficiently. Our approach is shown to yield much better hit-ratio than conventional ones.

[1]  Cyrus Shahabi,et al.  MediaQ: mobile multimedia management system , 2014, MMSys '14.

[2]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[3]  Yoshihide Sekimoto,et al.  Road Damage Detection and Classification Using Deep Neural Networks with Smartphone Images , 2018, Comput. Aided Civ. Infrastructure Eng..

[4]  J. Snyder Flattening the Earth: Two Thousand Years of Map Projections , 1994 .

[5]  Yoshihide Sekimoto,et al.  Road Damage Detection and Classification Using Deep Neural Networks with Smartphone Images , 2018, Comput. Aided Civ. Infrastructure Eng..

[6]  Larry S. Davis,et al.  AdaFrame: Adaptive Frame Selection for Fast Video Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Cyrus Shahabi,et al.  Image Classification to Determine the Level of Street Cleanliness: A Case Study , 2018, 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM).

[8]  Nevenka Dimitrova,et al.  Video keyframe extraction and filtering: a keyframe is not a keyframe to everyone , 1997, CIKM '97.

[9]  Cyrus Shahabi,et al.  A Deep Learning Approach for Road Damage Detection from Smartphone Images , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[10]  Jie Shen,et al.  A convolutional neural‐network‐based pedestrian counting model for various crowded scenes , 2019, Comput. Aided Civ. Infrastructure Eng..

[11]  Reuven Cohen,et al.  The Generalized Maximum Coverage Problem , 2008, Inf. Process. Lett..

[12]  Sung Wook Baik,et al.  Adaptive key frame extraction for video summarization using an aggregation mechanism , 2012, J. Vis. Commun. Image Represent..

[13]  Cyrus Shahabi,et al.  Recognizing Material of a Covered Object: A Case Study With Graffiti , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[14]  Nikolaos Papanikolopoulos,et al.  Counting pedestrians and bicycles in traffic scenes , 2009, 2009 12th International IEEE Conference on Intelligent Transportation Systems.

[15]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[16]  Roberto Marcondes Cesar Junior,et al.  Quantifying the Presence of Graffiti in Urban Environments , 2019, 2019 IEEE International Conference on Big Data and Smart Computing (BigComp).

[17]  Arnaldo de Albuquerque Araújo,et al.  VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method , 2011, Pattern Recognit. Lett..

[18]  Cyrus Shahabi,et al.  Key Frame Selection Algorithms for Automatic Generation of Panoramic Images from Crowdsourced Geo-tagged Videos , 2014, W2GIS.

[19]  T. Vincenty DIRECT AND INVERSE SOLUTIONS OF GEODESICS ON THE ELLIPSOID WITH APPLICATION OF NESTED EQUATIONS , 1975 .

[20]  Jian Gong,et al.  Vision-based counting of pedestrians and cyclists , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[21]  Cyrus Shahabi,et al.  Effectively crowdsourcing the acquisition and analysis of visual data for disaster response , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[22]  Zehdreh Allen-Lafayette,et al.  Flattening the Earth, Two Thousand Years of Map Projections , 1998 .

[23]  Roger Zimmermann,et al.  Viewable scene modeling for geospatial video search , 2008, ACM Multimedia.