Video classification based on ConvNet collaboration and feature selection

Today, video data, as a powerful multimedia component, is accompanied by some problems with increasing usage in communication, health, education, and social media in particular. Classification and detection of concepts in video data by automatic methods are some of these challenging problems. In this study, we propose a video classification system, which incorporates deep convolutional neural networks (CNNs) by leveraging feature selection and data fusion techniques to improve the accuracy of the classification. Principal Component Analysis (PCA) as a feature selection method and Discriminant Correlation Analysis (DCA) technique, which incorporates class associations into the correlation analysis of feature sets for data fusion, are applied to the problem at the feature level. Support Vector Machines (SVMs) have been trained with new feature vectors obtained from different deep convolutional neural networks by feature selection and data fusion methods. The proposed method is tested for 38 concepts on TRECVID 2013 SIN video task dataset and the results are evaluated. Our results show that the classification accuracy is improved by 4% with an accuracy of 50.27% when the proposed data fusion and feature selection techniques are used.

[1]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[2]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[3]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Jianquan Liu,et al.  Early and Late Level Fusion of Deep Convolutional Neural Networks for Visual Concept Recognition , 2016, Int. J. Semantic Comput..

[6]  Mohamed Abdel-Mottaleb,et al.  Discriminant Correlation Analysis: Real-Time Feature Level Fusion for Multimodal Biometric Recognition , 2016, IEEE Transactions on Information Forensics and Security.

[7]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Mustafa Sert,et al.  Efficient Bag of Words Based Concept Extraction for Visual Object Retrieval , 2015, FQAS.

[9]  Mustafa Sert,et al.  Fusing Deep Convolutional Networks for Large Scale Visual Concept Classification , 2016, 2016 IEEE Second International Conference on Multimedia Big Data (BigMM).

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).