TransNet V2: An effective deep network architecture for fast shot transition detection

Although automatic shot transition detection approaches are already investigated for more than two decades, an effective universal human-level model was not proposed yet. Even for common shot transitions like hard cuts or simple gradual changes, the potential diversity of analyzed video contents may still lead to both false hits and false dismissals. Recently, deep learning-based approaches significantly improved the accuracy of shot transition detection using 3D convolutional architectures and artificially created training data. Nevertheless, one hundred percent accuracy is still an unreachable ideal. In this paper, we share the current version of our deep network TransNet V2 that reaches state-of-the-art performance on respected benchmarks. A trained instance of the model is provided so it can be instantly utilized by the community for a highly efficient analysis of large video archives. Furthermore, the network architecture, as well as our experience with the training process, are detailed, including simple code snippets for convenient usage of the proposed model and visualization of results.

[1]  Rita Cucchiara,et al.  Shot and Scene Detection via Hierarchical Clustering for Re-using Broadcast Video , 2015, CAIP.

[2]  Rita Cucchiara,et al.  A Deep Siamese Network for Scene Detection in Broadcast Videos , 2015, ACM Multimedia.

[3]  George Awad,et al.  V3C - a Research Video Collection , 2018, MMM.

[4]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Wei Zhang,et al.  Fast Video Shot Transition Localization with Deep Structured Models , 2018, ACCV.

[6]  Přemysl Čech,et al.  A Framework for Effective Known-item Search in Video , 2019, ACM Multimedia.

[7]  Chen Sun,et al.  Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification , 2017, ECCV.

[8]  Paul Over,et al.  Video shot boundary detection: Seven years of TRECVid activity , 2010, Comput. Vis. Image Underst..

[9]  Wojciech Matusik,et al.  Large-scale, Fast and Accurate Shot Boundary Detection through Spatio-temporal Convolutional Neural Networks , 2017, ArXiv.

[10]  Nikolas P. Galatsanos,et al.  Simultaneous detection of abrupt cuts and dissolves in videos using support vector machines , 2009, Pattern Recognit. Lett..

[11]  Michael Gygli,et al.  Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks , 2017, 2018 International Conference on Content-Based Multimedia Indexing (CBMI).

[12]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Luca Rossetto,et al.  Interactive Video Retrieval in the Age of Deep Learning – Detailed Evaluation of VBS 2019 , 2020, IEEE Transactions on Multimedia.