Cloud-Assisted Multiview Video Summarization Using CNN and Bidirectional LSTM

The massive amount of video data produced by surveillance networks in industries instigate various challenges in exploring these videos for many applications, such as video summarization (VS), analysis, indexing, and retrieval. The task of multiview video summarization (MVS) is very challenging due to the gigantic size of data, redundancy, overlapping in views, light variations, and interview correlations. To address these challenges, various low-level features and clustering-based soft computing techniques are proposed that cannot fully exploit MVS. In this article, we achieve MVS by integrating deep neural network based soft computing techniques in a two-tier framework. The first online tier performs target-appearance-based shots segmentation and stores them in a lookup table that is transmitted to cloud for further processing. The second tier extracts deep features from each frame of a sequence in the lookup table and pass them to deep bidirectional long short-term memory (DB-LSTM) to acquire probabilities of informativeness and generates a summary. Experimental evaluation on benchmark dataset and industrial surveillance data from YouTube confirms the better performance of our system compared to the state-of-the-art MVS methods.

[1]  A. Senthil Murugan,et al.  A study on various methods used for video summarization and moving object detection for video surveillance applications , 2018, Multimedia Tools and Applications.

[2]  Luc Van Gool,et al.  The Interestingness of Images , 2013, 2013 IEEE International Conference on Computer Vision.

[3]  An-An Liu,et al.  Multi-Domain and Multi-Task Learning for Human Action Recognition , 2019, IEEE Transactions on Image Processing.

[4]  Ke Zhang,et al.  Video Summarization with Long Short-Term Memory , 2016, ECCV.

[5]  Bernard Mérialdo,et al.  Multi-video summarization based on Video-MMR , 2010, 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10.

[6]  Sung Wook Baik,et al.  Activity Recognition Using Temporal Optical Flow Convolutional Features and Multilayer LSTM , 2019, IEEE Transactions on Industrial Electronics.

[7]  Hichem Snoussi,et al.  Generative Neural Networks for Anomaly Detection in Crowded Scenes , 2019, IEEE Transactions on Information Forensics and Security.

[8]  Yunhui Liu,et al.  Diversified Key-Frame Selection Using Structured ${L_{2,1}}$ Optimization , 2014, IEEE Transactions on Industrial Informatics.

[9]  Cataldo Guaragnella,et al.  A Survey of Automatic Event Detection in Multi-Camera Third Generation Surveillance Systems , 2015, Int. J. Pattern Recognit. Artif. Intell..

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[12]  Lianbing Deng,et al.  A novel CNN based security guaranteed image watermarking generation scenario for smart city applications , 2019, Inf. Sci..

[13]  Shou-De Lin,et al.  Communication-efficient multi-view keyframe extraction in distributed video sensors , 2014, 2014 IEEE Visual Communications and Image Processing Conference.

[14]  Muhammad Haroon Yousaf,et al.  Multi-view human action recognition using 2D motion templates based on MHIs and their HOG description , 2016, IET Comput. Vis..

[15]  Ralph R. Martin,et al.  Online Video Stream Abstraction and Stylization , 2011, IEEE Transactions on Multimedia.

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Sung Wook Baik,et al.  Efficient CNN based summarization of surveillance videos for resource-constrained devices , 2020, Pattern Recognit. Lett..

[18]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Luc Van Gool,et al.  stagNet: An Attentive Semantic RNN for Group Activity and Individual Action Recognition , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[20]  Chang Choi,et al.  CNN‐based malicious user detection in social networks , 2018, Concurr. Comput. Pract. Exp..

[21]  Mohan S. Kankanhalli,et al.  Multi-Camera Action Dataset for Cross-Camera Action Recognition Benchmarking , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[22]  Fadi Al-Turjman,et al.  Task scheduling in cloud‐based survivability applications using swarm optimization in IoT , 2018, Trans. Emerg. Telecommun. Technol..

[23]  Jeremy S. Smith,et al.  Multi-view and multi-plane data fusion for effective pedestrian detection in intelligent visual surveillance , 2016, Multidimens. Syst. Signal Process..

[24]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[25]  Nagia M. Ghanem,et al.  VSCAN: An Enhanced Video Summarization Using Density-Based Spatial Clustering , 2013, ICIAP.

[26]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Ananda S. Chowdhury,et al.  Multi-View Video Summarization Using Bipartite Matching Constrained Optimum-Path Forest Clustering , 2015, IEEE Transactions on Multimedia.

[28]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[29]  Amit K. Roy-Chowdhury,et al.  Embedded sparse coding for summarizing multi-view videos , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[30]  Tommy W. S. Chow,et al.  Object-Level Video Advertising: An Optimization Framework , 2017, IEEE Transactions on Industrial Informatics.

[31]  Wei Jiang,et al.  Memorable and rich video summarization , 2017, J. Vis. Commun. Image Represent..

[32]  Sung Wook Baik,et al.  Privacy-preserving image retrieval for mobile devices with deep features on the cloud , 2018, Comput. Commun..

[33]  Ye Tian,et al.  KaaS: A Standard Framework Proposal on Video Skimming , 2016, IEEE Internet Computing.

[34]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Hayit Greenspan,et al.  GAN-based Synthetic Medical Image Augmentation for increased CNN Performance in Liver Lesion Classification , 2018, Neurocomputing.

[36]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[37]  Chia-han Lee,et al.  On-Line Multi-View Video Summarization for Wireless Video Sensor Network , 2015, IEEE Journal of Selected Topics in Signal Processing.

[38]  Xi Wang,et al.  Fast Summarization of User-Generated Videos: Exploiting Semantic, Emotional, and Quality Clues , 2016, IEEE MultiMedia.

[39]  Amit K. Roy-Chowdhury,et al.  Multi-View Surveillance Video Summarization via Joint Embedding and Sparse Optimization , 2017, IEEE Transactions on Multimedia.

[40]  Banshidhar Majhi,et al.  A multi-view video synopsis framework , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[41]  Sung Wook Baik,et al.  Object-oriented convolutional features for fine-grained image retrieval in large surveillance datasets , 2018, Future Gener. Comput. Syst..

[42]  Zhi-Hua Zhou,et al.  Multi-View Video Summarization , 2010, IEEE Transactions on Multimedia.

[43]  Amit K. Roy-Chowdhury,et al.  Video summarization in a multi-view camera network , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).