Learning an event-oriented and discriminative dictionary based on an adaptive label-consistent K-SVD method for event detection in soccer videos

Abstract In this paper, we formulate the soccer video event detection task as a sparse representation problem by learning a supervised, discriminative and event-oriented dictionary based on learned weighted local features. To this end, we present a novel framework based on two ideas: First, we propose an approach for computing the representativeness of each video frame for each soccer event. Second, we propose an Adaptive Label-Consistent K-SVD (ALC-KSVD) algorithm to learn an event-oriented and discriminative dictionary based on the computed representativeness of frames to transfer video frames to a sparse space. To improve discrimination among frames of different events, we proposed a weighting method to identify local features that are more representative in each event category. Next, the representativeness score of each frame is calculated by aggregating the weighted local features within each frame. The calculated representativeness score of each frame indicates its belonging degree to each event. The representativeness score matrix, being a discriminative term, is combined with the reconstruction error to form an objective function to improve the discrimination ability in the sparse representation during the dictionary learning process. The obtained objective function is efficiently and optimally solved by the K-SVD algorithm. The representativeness score matrix, which is automatically calculated based on the training samples, defines an adaptive correspondence between the dictionary atoms and the labels of the frames. We demonstrate the effectiveness of the proposed framework on the detection and classification of several soccer events based on an extensive experimental investigation that was conducted using a large collection of video data. The experimental results indicate that our approach maintains good classification performance and outperforms the state-of-the-art methods.

[1]  Somnath Sengupta,et al.  Bayesian Network-Based Customized Highlight Generation for Broadcast Soccer Videos , 2015, IEEE Transactions on Broadcasting.

[2]  Fumin Shen,et al.  Spatial and temporal scoring for egocentric video summarization , 2016, Neurocomputing.

[3]  Shohreh Kasaei,et al.  Event Detection and Summarization in Soccer Videos Using Bayesian Network and Copula , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Tiziana D'Orazio,et al.  A visual system for real time detection of goal events during soccer matches , 2009, Comput. Vis. Image Underst..

[5]  Changsheng Xu,et al.  A Novel Framework for Semantic Annotation and Personalized Retrieval of Sports Video , 2008, IEEE Transactions on Multimedia.

[6]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[7]  Tao Mei,et al.  A Bag-of-Importance Model With Locality-Constrained Coding Based Feature Learning for Video Summarization , 2014, IEEE Transactions on Multimedia.

[8]  Sridha Sridharan,et al.  Discovering Team Structures in Soccer from Spatiotemporal Data , 2016, IEEE Transactions on Knowledge and Data Engineering.

[9]  Changsheng Li,et al.  Sparse representation for robust abnormality detection in crowded scenes , 2014, Pattern Recognit..

[10]  Tiziana D'Orazio,et al.  An Investigation Into the Feasibility of Real-Time Soccer Offside Detection From a Multiple Camera System , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Xueming Qian,et al.  HMM based soccer video event detection using enhanced mid-level semantic , 2011, Multimedia Tools and Applications.

[12]  Mohamed S. Kamel,et al.  Kernelized Supervised Dictionary Learning , 2012, IEEE Transactions on Signal Processing.

[13]  Gene H. Golub,et al.  Tikhonov Regularization and Total Least Squares , 1999, SIAM J. Matrix Anal. Appl..

[14]  Hamid Reza Pourreza,et al.  A framework for dynamic restructuring of semantic video analysis systems based on learning attention control , 2016, Image Vis. Comput..

[15]  Noel E. O'Connor,et al.  Event detection in field sports video using audio-visual features and a support vector Machine , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Somying Thainimit,et al.  Predictive high-level feature representation based on dictionary learning , 2017, Expert Syst. Appl..

[17]  Qingming Huang,et al.  Image classification by non-negative sparse coding, correlation constrained low-rank and sparse decomposition , 2014, Comput. Vis. Image Underst..

[18]  Nicu Sebe,et al.  Event Oriented Dictionary Learning for Complex Event Detection , 2015, IEEE Transactions on Image Processing.

[19]  Chung-Lin Huang,et al.  Semantic analysis of soccer video using dynamic Bayesian network , 2006, IEEE Transactions on Multimedia.

[20]  Jia Liu,et al.  Automatic Player Detection, Labeling and Tracking in Broadcast Soccer Video , 2007, BMVC.

[21]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Junsong Yuan,et al.  Abnormal event detection in crowded scenes using sparse representation , 2013, Pattern Recognit..

[23]  Amir-Masoud Eftekhari-Moghadam,et al.  Fuzzy rule-based reasoning approach for event detection and annotation of broadcast soccer video , 2013, Appl. Soft Comput..

[24]  Luis Torres,et al.  Automatic summarization of soccer highlights using audio-visual descriptors , 2015, SpringerPlus.

[25]  Yi-Ping Phoebe Chen,et al.  Knowledge-Discounted Event Detection in Sports Video , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[26]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[27]  Pinar Duygulu Sahin,et al.  Sentioscope: A Soccer Player Tracking System Using Model Field Particles , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[28]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[29]  Larry S. Davis,et al.  Label Consistent K-SVD: Learning a Discriminative Dictionary for Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Jian Dong,et al.  A supervised dictionary learning and discriminative weighting model for action recognition , 2015, Neurocomputing.

[31]  Xinbo Gao,et al.  Tactic analysis based on real-world ball trajectory in soccer video , 2012, Pattern Recognit..

[32]  Changyin Sun,et al.  Supervised class-specific dictionary learning for sparse modeling in action recognition , 2012, Pattern Recognit..

[33]  A video semantic detection method based on locality-sensitive discriminant sparse representation and weighted KNN , 2016, J. Vis. Commun. Image Represent..

[34]  David Zhang,et al.  Sparse Representation Based Fisher Discrimination Dictionary Learning for Image Classification , 2014, International Journal of Computer Vision.

[35]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[36]  Ke Huang,et al.  Sparse Representation for Signal Classification , 2006, NIPS.

[37]  Chang Wen Chen,et al.  Sparse Representation With Spatio-Temporal Online Dictionary Learning for Promising Video Coding , 2016, IEEE Transactions on Image Processing.