On the Unsolved Problem of Shot Boundary Detection for Music Videos

This paper discusses open problems of detecting shot boundaries for music videos. The number of shots per second and the type of transition are considered to be a discriminating feature for music videos and a potential multi-modal music feature. By providing an extensive list of effects and transition types that are rare in cinematic productions but common in music videos, we emphasize the artistic use of transitions in music videos. By the use of examples we discuss in detail the shortcomings of state-of-the-art approaches and provide suggestions to address these issues.

[1]  Bo Zhang,et al.  A Formal Study of Shot Boundary Detection , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Ramesh C. Jain,et al.  Digital video segmentation , 1994, MULTIMEDIA '94.

[3]  Paul Over,et al.  Video shot boundary detection: Seven years of TRECVid activity , 2010, Comput. Vis. Image Underst..

[4]  Emilia Gómez,et al.  Musical Instrument Recognition in User-generated Videos using a Multimodal Convolutional Neural Network Architecture , 2017, ICMR.

[5]  Nuria Oliver,et al.  MuViSync: Realtime music video alignment , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[6]  Andreas Rauber,et al.  Harnessing Music-Related Visual Stereotypes for Music Information Retrieval , 2016, ACM Trans. Intell. Syst. Technol..

[7]  Georges Quénot,et al.  CLIPS at TRECVID : Shot Boundary Detection and Feature Detection , 2003, TRECVID.

[8]  Giridharan Iyengar,et al.  Models for automatic classification of video sequences , 1997, Electronic Imaging.

[9]  Tao Liu,et al.  BUPT at TRECVID 2007: Shot Boundary Detection , 2007, TRECVID.

[10]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[11]  Alexander Schindler,et al.  A Picture is Worth a Thousand Songs: Exploring Visual Aspects of Music , 2014, DLfM '14.

[12]  Bo Zhang,et al.  A novel shot boundary detection framework , 2005, Visual Communications and Image Processing.

[13]  Andreas Rauber,et al.  An Audio-Visual Approach to Music Genre Classification through Affective Color Features , 2015, ECIR.

[14]  Ioannis Pitas,et al.  Information theory-based shot cut/fade detection and video summarization , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Samit Bhattacharya,et al.  Using Deep and Convolutional Neural Networks for Accurate Emotion Classification on DEAP Dataset , 2017, AAAI.

[16]  Rita Cucchiara,et al.  Hierarchical Boundary-Aware Neural Encoder for Video Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  John S. Boreczky,et al.  A hidden Markov model framework for video segmentation using audio and image features , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[18]  Qingning Zeng,et al.  Shot Boundary Detection Based on Difference Sequences of Mutual Information , 2007, Fourth International Conference on Image and Graphics (ICIG 2007).

[19]  Xavier Serra,et al.  Multi-Label Music Genre Classification from Audio, Text and Images Using Deep Features , 2017, ISMIR.

[20]  Xinyang Huang,et al.  A Image Digital Watermarking based on DWT in Invariant Wavelet Domain , 2007, Fourth International Conference on Image and Graphics (ICIG 2007).

[21]  N. Nikolaidis,et al.  Video shot detection and condensed representation. a review , 2006, IEEE Signal Processing Magazine.

[22]  Rainer Lienhart,et al.  Reliable dissolve detection , 2001, IS&T/SPIE Electronic Imaging.