Event-Oriented 3D Convolutional Features Selection and Hash Codes Generation Using PCA for Video Retrieval

The extensive video surveillance networks gather an enormous amount of data exponentially on a daily basis and its management is a challenging task, requiring efficient and effective techniques for searching, indexing, and retrieval. The employed mainstream techniques are focusing on general category videos, where the important events in surveillance require fine-grained events retrieval. In this paper, we introduce an event-oriented feature selection mechanism by utilizing the intermediate convolutional layer of a pre-trained 3D-CNN model, that is selected after deep investigation of its weights and response to a particular event. The extracted exclusive features represent an event semantically and effectively eliminate those neurons which do not respond to an event. Furthermore, the event-oriented convolutional features are of very high-dimensions, requiring additional storage, and take more time in feature comparison for retrieval. Therefore, we generate compact binary codes from these features using principle component analysis (PCA) algorithm. This makes our system more efficient to retrieve videos from large scale database. We evaluated our approach on the challenging events of UCF101 and HMDB51 datasets for original features and generated compact codes to achieve reduced execution time and better precision and recall scores.

[1]  M. Sreeraj,et al.  Content Based Video Retrieval Using SURF Descriptor , 2013, 2013 Third International Conference on Advances in Computing and Communications.

[2]  Sung Wook Baik,et al.  Conflux LSTMs Network: A Novel Approach for Multi-View Action Recognition , 2020, Neurocomputing.

[3]  Xuelong Hu,et al.  Video object matching based on SIFT algorithm , 2008, 2008 International Conference on Neural Networks and Signal Processing.

[4]  Javier Del Ser,et al.  Vision-based personalized Wireless Capsule Endoscopy for smart healthcare: Taxonomy, literature review, opportunities and challenges , 2020, Future Gener. Comput. Syst..

[5]  Andinet Enquobahrie,et al.  Content-based retrieval of video segments from minimally invasive surgery videos using deep convolutional video descriptors and iterative query refinement , 2019, Medical Imaging.

[6]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[7]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[8]  Deng Cai,et al.  Density Sensitive Hashing , 2012, IEEE Transactions on Cybernetics.

[9]  Sung Wook Baik,et al.  Action recognition using optimized deep autoencoder and CNN for surveillance data streams of non-stationary environments , 2019, Future Gener. Comput. Syst..

[10]  Khan Muhammad,et al.  Cost-Effective Video Summarization Using Deep CNN With Hierarchical Weighted Fusion for IoT Surveillance Networks , 2020, IEEE Internet of Things Journal.

[11]  Xiaoyu Chen,et al.  Face Detection in Security Monitoring Based on Artificial Intelligence Video Retrieval Technology , 2020, IEEE Access.

[12]  Hui Sun,et al.  A Video Representation Method Based on Multi-View Structure Preserving Embedding for Action Retrieval , 2019, IEEE Access.

[13]  Sung Wook Baik,et al.  Personalized Movie Summarization Using Deep CNN-Assisted Facial Expression Recognition , 2019, Complex..

[14]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[15]  Yongfeng Huang,et al.  CSFL: A novel unsupervised convolution neural network approach for visual pattern classification , 2017, AI Communications.

[16]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[17]  Xin Huang,et al.  SLTFNet: A spatial and language-temporal tensor fusion network for video moment retrieval , 2019, Inf. Process. Manag..

[18]  G. S. Naveen Kumar,et al.  Key Frame Extraction Using Rough Set Theory for Video Retrieval , 2019 .

[19]  Fang Huang,et al.  CNN-VWII: An Efficient Approach for Large-Scale Video Retrieval by Image Queries , 2018, Pattern Recognit. Lett..

[20]  Hans-Peter Kriegel,et al.  State-of-the-Art in Content-Based Image and Video Retrieval , 2001, Computational Imaging and Vision.

[21]  Shih-Fu Chang,et al.  Spherical hashing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Khan Muhammad,et al.  DeepReS: A Deep Learning-Based Video Summarization Strategy for Resource-Constrained Industrial Surveillance Scenarios , 2020, IEEE Transactions on Industrial Informatics.

[23]  Joel J. P. C. Rodrigues,et al.  Energy-Efficient Monitoring of Fire Scenes for Intelligent Networks , 2020, IEEE Network.

[24]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[25]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[26]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  S. Arun Prakash,et al.  Colour and orientation of pixel based video retrieval using IHBM similarity measure , 2019, Multimedia Tools and Applications.

[28]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[29]  Sung Wook Baik,et al.  Intelligent Baby Behavior Monitoring using Embedded Vision in IoT for Smart Healthcare Centers , 2019, Journal of Artificial Intelligence and Systems.

[30]  Javier Del Ser,et al.  Deep Learning for Multigrade Brain Tumor Classification in Smart Healthcare Systems: A Prospective Survey , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Sung Wook Baik,et al.  Object-oriented convolutional features for fine-grained image retrieval in large surveillance datasets , 2018, Future Gener. Comput. Syst..

[32]  Chong-Wah Ngo,et al.  Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[33]  Sushanta Mukhopadhyay,et al.  Video Retrieval Based on Motion Vector Key Frame Extraction and Spatial Pyramid Matching , 2019, 2019 6th International Conference on Signal Processing and Integrated Networks (SPIN).

[34]  Muhammad Sajjad,et al.  Human Behavior Understanding in Big Multimedia Data Using CNN based Facial Expression Recognition , 2019, Mob. Networks Appl..

[35]  Sung Wook Baik,et al.  Efficient Conversion of Deep Features to Compact Binary Codes Using Fourier Decomposition for Multimedia Big Data , 2018, IEEE Transactions on Industrial Informatics.

[36]  Qi Tian,et al.  Large-scale video copy retrieval with temporal-concentration SIFT , 2016, Neurocomputing.

[37]  Jonathon Shlens,et al.  A Tutorial on Principal Component Analysis , 2014, ArXiv.

[38]  Shaohua Wan,et al.  A long video caption generation algorithm for big video data retrieval , 2019, Future Gener. Comput. Syst..

[39]  Sung Wook Baik,et al.  Integrating salient colors with rotational invariant texture features for image representation in retrieval systems , 2017, Multimedia Tools and Applications.

[40]  Obaid Ur Rehman,et al.  A Benchmark Dataset and Learning High-Level Semantic Embeddings of Multimedia for Cross-Media Retrieval , 2018, IEEE Access.