Retrieval of Similar Scenes Based on Multimodal Distance Metric Learning in Soccer Videos

This paper presents a new method for retrieval of similar scenes based on multimodal distance metric learning in far-view soccer videos that broadly capture soccer fields and are not edited. We extract visual features and audio features from soccer video clips, and we extract text features from text data corresponding to these soccer video clips. In addition, distance metric learning based on Laplacian Regularized Metric Learning is performed to calculate the distances for each kind of features. Finally, by determining the final rank by integrating these distances, we realize successful multimodal retrieval of similar scenes from query scenes of soccer video clips. Experimental results show the effectiveness of our retrieval method.

[1]  Liang Zhou,et al.  Mobile Device-to-Device Video Distribution , 2016, ACM Trans. Multim. Comput. Commun. Appl..

[2]  Said Jai-Andaloussi,et al.  Soccer Events Summarization by Using Sentiment Analysis , 2015, 2015 International Conference on Computational Science and Computational Intelligence (CSCI).

[3]  A. Murat Tekalp,et al.  Automatic soccer video analysis and summarization , 2003, IEEE Trans. Image Process..

[4]  Nuno Correia,et al.  Automatic Generation of Sport Video Highlights Based on Fan's Emotions and Content , 2016, ACE.

[5]  Wei Liu,et al.  Semi-supervised distance metric learning for collaborative image retrieval and clustering , 2010, ACM Trans. Multim. Comput. Commun. Appl..

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Yi-Ping Phoebe Chen,et al.  Knowledge-Discounted Event Detection in Sports Video , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[8]  Pinar Duygulu Sahin,et al.  Sentioscope: A Soccer Player Tracking System Using Model Field Particles , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[10]  Xiaojun Chang,et al.  Incremental Multimodal Query Construction for Video Search , 2015, ICMR.

[11]  Fei Gao,et al.  Deep Multimodal Distance Metric Learning Using Click Constraints for Image Ranking , 2017, IEEE Transactions on Cybernetics.

[12]  Jürgen Perl,et al.  Tactics Analysis in Soccer - An Advanced Approach , 2013, Int. J. Comput. Sci. Sport.

[13]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Wei Liu,et al.  Constrained Metric Learning Via Distance Gap Maximization , 2010, AAAI.

[15]  Luis Torres,et al.  Automatic summarization of soccer highlights using audio-visual descriptors , 2015, SpringerPlus.

[16]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Muhammad Zeeshan Khan,et al.  Learning Deep C3D Features For Soccer Video Event Detection , 2018, 2018 14th International Conference on Emerging Technologies (ICET).

[18]  Miki Haseyama,et al.  Field position estimation in soccer videos using convolutional neural network-based image features , 2019, Other Conferences.

[19]  Wen Gao,et al.  Trajectory based event tactics analysis in broadcast sports video , 2007, ACM Multimedia.

[20]  Yuji Matsumoto,et al.  Applying Conditional Random Fields to Japanese Morphological Analysis , 2004, EMNLP.

[21]  Miki Haseyama,et al.  Estimation of Important Scenes in Soccer Videos Based on Collaborative Use of Audio-Visual CNN Features , 2018, 2018 IEEE 7th Global Conference on Consumer Electronics (GCCE).

[22]  Shohreh Kasaei,et al.  Event Detection and Summarization in Soccer Videos Using Bayesian Network and Copula , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  Alireza Behrad,et al.  Learning an event-oriented and discriminative dictionary based on an adaptive label-consistent K-SVD method for event detection in soccer videos , 2018, J. Vis. Commun. Image Represent..

[24]  Jiwen Lu,et al.  Distance metric learning for pattern recognition , 2018, Pattern Recognit..

[25]  Jing Xue,et al.  Automatic Soccer Video Event Detection Based on a Deep Neural Network Combined CNN and RNN , 2016, 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI).