论文信息 - MovieScope: Movie trailer classification using Deep Neural Networks

MovieScope: Movie trailer classification using Deep Neural Networks

This paper deals with identifying the genre of a movie by analyzing just the visual features of its trailer. This task seems to be very trivial for a human; our endeavor is to create a vision system that can do the same, accurately. We discuss the approaches we take and our experimental observations. The contributions of this work are : (1) we propose a neural network (based on VGG) that can classify movie trailers based on their genres; (2) we release a curated dataset, called YouTube-Trailer Dataset, which has over 800 movie trailers spanning over 4 genres. We achieve an accuracy of 80.1% with the spatial features, and 85% with using LSTM and set these results as the benchmark for this dataset. We have made the source code publicly available.1

K. Sivaraman | Gautam Somappa | K Sivaraman | Gautam Somappa

[1] Thomas Serre,et al. HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[2] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] Matthew J. Hausknecht,et al. Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[5] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7] Andrew Zisserman,et al. Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[9] Christopher Joseph Pal,et al. Delving Deeper into Convolutional Networks for Learning Video Representations , 2015, ICLR.

[10] Gunnar Farnebäck,et al. Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[11] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.