Mvgan Maximizing Time-Lag Aware Canonical Correlation for Baseball Highlight Generation

This paper presents multi-view unsupervised generative adversarial network maximizing time-lag aware canonical correlation (Mv-GAN) for baseball highlight generation. MvGAN has the following two contributions. First, MvGAN utilizes textual, visual and audio features calculated from tweets and videos as multi-view features. MvGAN which adopts these multi-view features is the effective work for highlight generation of baseball videos. Second, since there is a temporal difference between posted tweets and the corresponding events, MvGAN introduces a novel feature embedding scheme considering a time-lag between textual features and other features. Specifically, the proposed method newly derives the time-lag aware canonical correlation maximization of these multi-view features. This is the biggest contribution of this paper. Furthermore, since MvGAN is an unsupervised method for highlight generation, a large amount of training data with annotation is not needed. Thus, the proposed method has high applicability to the real world.

[1]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[2]  Naveen Kumar,et al.  Generalized Multiview Shared Subspace Learning Using View Bootstrapping , 2019, IEEE Transactions on Signal Processing.

[3]  Fabio Viola,et al.  The Kinetics Human Action Video Dataset , 2017, ArXiv.

[4]  Chiou-Ting Hsu,et al.  Fusion of audio and motion information on HMM-based highlight extraction for baseball games , 2006, IEEE Transactions on Multimedia.

[5]  Hiroshi Murase,et al.  Event Detection based on Twitter Enthusiasm Degree for Generating a Sports Highlight Video , 2014, ACM Multimedia.

[6]  Matthew Turk,et al.  Automatic Cricket Highlight Generation Using Event-Driven and Excitement-Based Features , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  D A Asch,et al.  The content of social media's shared images about Ebola: a retrospective study. , 2015, Public health.

[10]  Zhong Zhou,et al.  Tweet2Vec: Character-Based Distributed Representations for Social Media , 2016, ACL.

[11]  Yutaka Satoh,et al.  Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[13]  Lei Shi,et al.  MAD-GAN: Multivariate Anomaly Detection for Time Series Data with Generative Adversarial Networks , 2019, ICANN.

[14]  Mei Han,et al.  Extract highlights from baseball game video with hidden Markov models , 2002, Proceedings. International Conference on Image Processing.

[15]  Winston H. Hsu,et al.  Live Semantic Sport Highlight Detection Based on Analyzing Tweets of Twitter , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[16]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[17]  Allan Aasbjerg Nielsen,et al.  Multiset canonical correlations analysis and multispectral, truly multitemporal remote sensing data , 2002, IEEE Trans. Image Process..

[18]  Miki Haseyama,et al.  Estimation of Important Scenes in Soccer Videos Based on Collaborative Use of Audio-Visual CNN Features , 2018, 2018 IEEE 7th Global Conference on Consumer Electronics (GCCE).

[19]  Naoufel Werghi,et al.  Ground Moving Radar Targets Classification Based on Spectrogram Images Using Convolutional Neural Networks , 2018, 2018 19th International Radar Symposium (IRS).