论文信息 - Multimodal social media video classification with deep neural networks

Multimodal social media video classification with deep neural networks

Classifying videos according to their content is a common task across various contexts, as it allows effective content tagging, indexing and searching. In this work, we propose a general framework for video classification that is built on top of several neural network architectures. Since we rely on a multimodal approach, we extract both visual and textual features from videos and combine them in a final classification algorithm. When trained on a dataset of 30 000 social media videos and evaluated on 6 000 videos, our multimodal deep learning algorithm outperforms shallow single-modality classification methods by a large margin of up to 95%, achieving overall accuracy of 88%.

Tomasz Trzcinski | T. Trzcinski

[1] Noel E. O'Connor,et al. Bags of Local Convolutional Features for Scalable Instance Search , 2016, ICMR.

[2] Yiannis Andreopoulos,et al. Video Classification With CNNs: Using the Codec as a Spatio-Temporal Activity Sensor , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[3] Cordelia Schmid,et al. Evaluation of GIST descriptors for web-scale image search , 2009, CIVR '09.

[4] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Apostol Natsev,et al. YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.

[6] Pavlo Molchanov,et al. Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification , 2016, ACM Multimedia.

[7] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[8] Matthew J. Hausknecht,et al. Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Ngai-Man Cheung,et al. Truly Multi-modal YouTube-8M Video Classification with Video, Audio, and Text , 2017, ArXiv.

[10] Karl Aberer,et al. Multimodal Classification for Analysing Social Media , 2017, ArXiv.

[11] Jun Wang,et al. Fusing Multi-Stream Deep Networks for Video Classification , 2015, ArXiv.

[12] Mustafa Sert,et al. Multimodal video concept classification based on convolutional neural network and audio feature combination , 2017, 2017 25th Signal Processing and Communications Applications Conference (SIU).

[13] Xiaoqing Feng,et al. Multimodal video classification with stacked contractive autoencoders , 2016, Signal Process..