Estimation of Scenes Contributing to Score in Tennis Video Using Attention

The use of image processing technology for sports is increasing. By analyzing athletes and teams based on video analysis, scientific and objective analysis can be conducted separately from subjective analysis of experts, and individuals or teams can be evaluated with the same index. As can be seen from such a trend, recently, technological progress has made it possible to detect and analyze what can be visually confirmed in the sports video. It is certain that these technologies are helping coaches in the analysis of movement. However, these cannot play the role of coaches. The role is to judge what is important from what can be visually confirmed. There are also empirical and qualitative parts in coach’s judgment, and no reproducibility of understanding the important parts unless it is a specialist. Based on this background, we thought that it would be useful to extract more important scenes during the game using quantitative information. Such technology can be applied to various sports, but this paper, we focused on tennis. In this thesis, the aim is to estimate which play was largely contributed to the tennis game result (score, failure). In addition, we don’t use supervised information that can be obtained from an empirical point of view. This is to eliminate dataset dependency caused by using data created from qualitative information. Specifically, based on quantitative information such as athletes’ movement and score result, we attempted to estimate the attention from unsupervised method.

[1]  Brejesh Lall,et al.  Automated ball tracking in tennis videos , 2015, 2015 Third International Conference on Image Information Processing (ICIIP).

[2]  Sridha Sridharan,et al.  Forecasting the Next Shot Location in Tennis Using Fine-Grained Spatiotemporal Tracking Data , 2016, IEEE Transactions on Knowledge and Data Engineering.

[3]  Qiang Huang,et al.  Tennis Ball Tracking Using a Two-Layered Data Association Approach , 2015, IEEE Transactions on Multimedia.

[4]  Somnath Sengupta,et al.  Bayesian Network-Based Customized Highlight Generation for Broadcast Soccer Videos , 2015, IEEE Transactions on Broadcasting.

[5]  Minyi Guo,et al.  Unsupervised Extraction of Video Highlights via Robust Recurrent Auto-Encoders , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Naokazu Yokoya,et al.  Summarization of User-Generated Sports Video by Using Deep Action Recognition Features , 2017, IEEE Transactions on Multimedia.

[7]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.