Important Scene Detection Of Baseball Videos Via Time-Lag Aware Deep Multiset Canonical Correlation Maximization

This paper presents a new important scene detection method of baseball videos based on correlation maximization between heterogeneous modalities via time-lag aware deep multiset canonical correlation analysis (Tl-dMCCA). The technical contributions of this paper are twofold. First, textual, visual and audio features calculated from tweets and videos are adopted as multi-view time series features. Since Tl-dMCCA which utilizes these features includes the unsupervised embedding scheme via deep networks, the proposed method can flexibly express the relationship between heterogeneous features. Second, since there is the time-lag between posted tweets and the corresponding multiple previous events, Tl-dMCCA considers the time-lag relationships between them. Specifically, we newly introduce the representation of such time-lags into the derivation of their covariance matrices. By considering time-lags via Tl-dMCCA, the proposed method correctly detects important scenes.

[1]  Miki Haseyama,et al.  Estimation of Important Scenes in Soccer Videos Based on Collaborative Use of Audio-Visual CNN Features , 2018, 2018 IEEE 7th Global Conference on Consumer Electronics (GCCE).

[2]  Naveen Kumar,et al.  Generalized Multiview Shared Subspace Learning Using View Bootstrapping , 2019, IEEE Transactions on Signal Processing.

[3]  Mei Han,et al.  Extract highlights from baseball game video with hidden Markov models , 2002, Proceedings. International Conference on Image Processing.

[4]  Hiroshi Murase,et al.  Event Detection based on Twitter Enthusiasm Degree for Generating a Sports Highlight Video , 2014, ACM Multimedia.

[5]  Lucas C. Parra,et al.  Correlated Components Analysis - Extracting Reliable Dimensions in Multivariate Data , 2018, Neurons, Behavior, Data analysis, and Theory.

[6]  J. Leeuw Derivatives of Generalized Eigen Systems with Applications , 2007 .

[7]  Matthew Turk,et al.  Automatic Cricket Highlight Generation Using Event-Driven and Excitement-Based Features , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[8]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[9]  Fabio Viola,et al.  The Kinetics Human Action Video Dataset , 2017, ArXiv.

[10]  Naoufel Werghi,et al.  Ground Moving Radar Targets Classification Based on Spectrogram Images Using Convolutional Neural Networks , 2018, 2018 19th International Radar Symposium (IRS).

[11]  Abderrahim Sekkaki,et al.  Soccer Video Summarization Using Video Content Analysis and Social Media Streams , 2014, 2014 IEEE/ACM International Symposium on Big Data Computing.

[12]  Winston H. Hsu,et al.  Live Semantic Sport Highlight Detection Based on Analyzing Tweets of Twitter , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[13]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[14]  Miki Haseyama,et al.  Semantic Shot Classification in Baseball Videos Based on Similarities of Visual Features , 2019, 2019 IEEE 8th Global Conference on Consumer Electronics (GCCE).

[15]  Allan Aasbjerg Nielsen,et al.  Multiset canonical correlations analysis and multispectral, truly multitemporal remote sensing data , 2002, IEEE Trans. Image Process..

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Björn W. Schuller,et al.  Snore Sound Classification Using Image-Based Deep Spectrum Features , 2017, INTERSPEECH.

[18]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[19]  Keiichiro Hoashi,et al.  Social Indexing of TV Programs: Detection and Labeling of Significant TV Scenes by Twitter Analysis , 2012, 2012 26th International Conference on Advanced Information Networking and Applications Workshops.

[20]  Chiou-Ting Hsu,et al.  Fusion of audio and motion information on HMM-based highlight extraction for baseball games , 2006, IEEE Transactions on Multimedia.

[21]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[22]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[23]  Lei Shi,et al.  MAD-GAN: Multivariate Anomaly Detection for Time Series Data with Generative Adversarial Networks , 2019, ICANN.

[24]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[25]  D A Asch,et al.  The content of social media's shared images about Ebola: a retrospective study. , 2015, Public health.

[26]  Zhong Zhou,et al.  Tweet2Vec: Character-Based Distributed Representations for Social Media , 2016, ACL.

[27]  Yutaka Satoh,et al.  Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.