Analysis of technical features in basketball video based on deep learning algorithm

Abstract The research of video-based sports movement analysis technology has an important application value. The introduction of digital video, human–computer interaction and other technologies in sports training can greatly improve training efficiency. This paper studies the technical characteristics of the players in the basketball game video and proposes a behavior analysis method based on deep learning. We first design a method to automatically extract the basketball court and stadium marking line. Subsequently, key frames in the video are captured using a spatiotemporal scoring mechanism. Afterward, we develop a behavior recognition and prediction method based on an encoder–decoder framework. The analysis results can be fed back to coaches and data analysts in real-time to help them analyze the tactics and technical choices. Experiments on the proposed method are carried out on a large basketball video dataset. The results show that the proposed method can effectively identify the motion of video characters while achieving high behavior analysis accuracy.

[1]  Athman Bouguettaya,et al.  On-Line Clustering , 1996, IEEE Trans. Knowl. Data Eng..

[2]  Shiyang Lu,et al.  Keypoint-Based Keyframe Selection , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Mohammed Javed,et al.  An efficient method for video shot boundary detection and keyframe extraction using SIFT-point distribution histogram , 2016, International Journal of Multimedia Information Retrieval.

[4]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[5]  Ling Shao,et al.  Learning Spatio-Temporal Representations for Action Recognition: A Genetic Programming Approach , 2016, IEEE Transactions on Cybernetics.

[6]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[7]  Rashid Ansari,et al.  Multimodal human discourse: gesture and speech , 2002, TCHI.

[8]  Jiebo Luo,et al.  Towards Extracting Semantically Meaningful Key Frames From Personal Video Clips: From Humans to Computers , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Ravi Iyer,et al.  Adaptive Keyframe Selection for Video Summarization , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[10]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Qinghua Hu,et al.  Semi-Supervised Image-to-Video Adaptation for Video Action Recognition , 2017, IEEE Transactions on Cybernetics.

[12]  Jiebo Luo,et al.  Towards Scalable Summarization of Consumer Videos Via Sparse Dictionary Selection , 2012, IEEE Transactions on Multimedia.

[13]  Wang Shaohu Multi-dimensional fuzzy clustering image segmentation algorithm based on kernel metric and local information , 2015 .

[14]  Carme Torras,et al.  Action Recognition Based on Efficient Deep Feature Learning in the Spatio-Temporal Domain , 2016, IEEE Robotics and Automation Letters.

[15]  Xiang Wang,et al.  Compressive wideband spectrum sensing based on single channel , 2015 .

[16]  Christof Koch,et al.  Image Signature: Highlighting Sparse Salient Regions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Xin Yang,et al.  Learning the Conformal Transformation Kernel for Image Recognition , 2017, IEEE Transactions on Neural Networks and Learning Systems.