Event detection in soccer videos using unsupervised learning of Spatio-temporal features based on pooled spatial pyramid model

Most existing researches for semantic analysis of soccer videos benefit from special approaches to bridge the semantic gap between low-level features and high-level events using a hierarchical structure. In this paper, we propose a novel data-driven model for automatic recognition of important events in soccer broadcast videos based on the analysis of spatio-temporal local features of video frames. Our presented algorithm explores the local visual content of video frames by focusing on spatial and temporal learned features in a low-dimensional transformed sparse space. The proposed algorithm, without using mid-level futures, dynamically extracts the most informative semantic concepts/features and improves the generality of the system. The dictionary learning process plays an important role in sparse coding and sparse representation-based event classification. In this paper, we present a novel dictionary learning method, which calculates several category-specific dictionaries by training the detected shots of various view categories. In order to evaluate the feasibility and effectiveness of the proposed algorithm, an extensive experimental investigation is conducted for the analysis, detection, and classification of soccer events on a large collection of video data. Experimental results indicate that our approach outperforms the state-of-the-art methods and demonstrate the effectiveness of the proposed approach.

[1]  Christoph Meinel,et al.  Image Captioning with Deep Bidirectional LSTMs , 2016, ACM Multimedia.

[2]  Vijay Kumar,et al.  Event recognition in broadcast soccer videos , 2016, ICVGIP '16.

[3]  Fei Su,et al.  Specific video identification via joint learning of latent semantic concept, scene and temporal structure , 2016, Neurocomputing.

[4]  Luming Zhang,et al.  Action2Activity: Recognizing Complex Activities from Sensor Data , 2015, IJCAI.

[5]  Shaohui Mei,et al.  Video summarization via minimum sparse reconstruction , 2015, Pattern Recognit..

[6]  Hamid Reza Pourreza,et al.  A framework for dynamic restructuring of semantic video analysis systems based on learning attention control , 2016, Image Vis. Comput..

[7]  David A. Clausi,et al.  Soccer Video Structure Analysis by Parallel Feature Fusion Network and Hidden-to-Observable Transferring Markov Model , 2017, IEEE Access.

[8]  Nannan Li,et al.  Spatio-temporal context analysis within video volumes for anomalous-event detection and localization , 2015, Neurocomputing.

[9]  Shaohui Mei,et al.  Video Summarization with Global and Local Features , 2012, 2012 IEEE International Conference on Multimedia and Expo Workshops.

[10]  Larry S. Davis,et al.  Label Consistent K-SVD: Learning a Discriminative Dictionary for Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Stefanos D. Kollias,et al.  Semantic association of multimedia document descriptions through fuzzy relational algebra and fuzzy reasoning , 2004, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[12]  Jian-quan Ouyang,et al.  Ontology reasoning scheme for constructing meaningful sports video summarisation , 2013, IET Image Process..

[13]  Jing Xue,et al.  Automatic Soccer Video Event Detection Based on a Deep Neural Network Combined CNN and RNN , 2016, 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI).

[14]  Charles Perin,et al.  SoccerStories: A Kick-off for Visual Soccer Analysis , 2013, IEEE Transactions on Visualization and Computer Graphics.

[15]  Vassilios Morellas,et al.  Positive definite dictionary learning for region covariances , 2011, 2011 International Conference on Computer Vision.

[16]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[17]  Tao Mei,et al.  A Bag-of-Importance Model With Locality-Constrained Coding Based Feature Learning for Video Summarization , 2014, IEEE Transactions on Multimedia.

[18]  Jiebo Luo,et al.  Towards Scalable Summarization of Consumer Videos Via Sparse Dictionary Selection , 2012, IEEE Transactions on Multimedia.

[19]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[20]  Chung-Lin Huang,et al.  Semantic analysis of soccer video using dynamic Bayesian network , 2006, IEEE Transactions on Multimedia.

[21]  Aboul Ella Hassanien,et al.  SVM-based soccer video summarization system , 2011, 2011 Third World Congress on Nature and Biologically Inspired Computing.

[22]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[23]  Martial Hebert,et al.  Discriminative Sparse Image Models for Class-Specific Edge Detection and Image Interpretation , 2008, ECCV.

[24]  Koichi Shinoda,et al.  A Fast and Accurate Video Semantic-Indexing System Using Fast MAP Adaptation and GMM Supervectors , 2012, IEEE Transactions on Multimedia.

[25]  Christoph Meinel,et al.  Deep Semantic Mapping for Cross-Modal Retrieval , 2015, 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI).

[26]  Guillermo Sapiro,et al.  Classification and clustering via dictionary learning with structured incoherence and shared features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Wei Zhao,et al.  Event detection in soccer videos using shot focus identification , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[28]  Somnath Sengupta,et al.  Bayesian Network-Based Customized Highlight Generation for Broadcast Soccer Videos , 2015, IEEE Transactions on Broadcasting.

[29]  Christoph Meinel,et al.  A deep semantic framework for multimodal representation learning , 2016, Multimedia Tools and Applications.

[30]  Christoph Meinel,et al.  Exploring multimodal video representation for action recognition , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[31]  Hongkai Xiong,et al.  Sparse Representation With Spatio-Temporal Online Dictionary Learning for Promising Video Coding. , 2016, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[32]  Yoshua Bengio,et al.  Credit Assignment through Time: Alternatives to Backpropagation , 1993, NIPS.

[33]  Shohreh Kasaei,et al.  Event Detection and Summarization in Soccer Videos Using Bayesian Network and Copula , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[34]  Xueming Qian,et al.  HMM based soccer video event detection using enhanced mid-level semantic , 2011, Multimedia Tools and Applications.

[35]  Tiziana D'Orazio,et al.  A visual system for real time detection of goal events during soccer matches , 2009, Comput. Vis. Image Underst..

[36]  Mukesh A. Zaveri,et al.  Frame based approach for automatic event boundary detection of soccer video using optical flow , 2017, 2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA).

[37]  David S. Rosenblum,et al.  From action to activity: Sensor-based activity recognition , 2016, Neurocomputing.

[38]  Noel E. O'Connor,et al.  Event detection in field sports video using audio-visual features and a support vector Machine , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[39]  Shiyang Lu,et al.  Keypoint-Based Keyframe Selection , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[40]  Tiziana D'Orazio,et al.  An Investigation Into the Feasibility of Real-Time Soccer Offside Detection From a Multiple Camera System , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[41]  Jianping Fan,et al.  Learning inter-related visual dictionary for object recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[43]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  A. Murat Tekalp,et al.  Automatic Soccer Video Analysis and Summarization , 2003, IS&T/SPIE Electronic Imaging.

[45]  Junqing Yu,et al.  Soccer Video Event Annotation by Synchronization of Attack–Defense Clips and Match Reports With Coarse-Grained Time Information , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[46]  Heng Tao Shen,et al.  Beyond Frame-level CNN: Saliency-Aware 3-D CNN With LSTM for Video Action Recognition , 2017, IEEE Signal Processing Letters.

[47]  Yi-Ping Phoebe Chen,et al.  Knowledge-Discounted Event Detection in Sports Video , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[48]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[49]  Florent Perronnin,et al.  Universal and Adapted Vocabularies for Generic Visual Categorization , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  David Zhang,et al.  Sparse Representation Based Fisher Discrimination Dictionary Learning for Image Classification , 2014, International Journal of Computer Vision.

[51]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[52]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[53]  Haoyu Wang,et al.  Soccer Video Event Detection Using 3D Convolutional Networks and Shot Boundary Detection via Deep Feature Distance , 2017, ICONIP.

[54]  Akio Nagasaka,et al.  Automatic Video Indexing and Full-Video Search for Object Appearances , 1991, VDB.

[55]  Yu Zheng,et al.  Urban Water Quality Prediction Based on Multi-Task Multi-View Learning , 2016, IJCAI.

[56]  Junsong Yuan,et al.  Abnormal event detection in crowded scenes using sparse representation , 2013, Pattern Recognit..

[57]  Amir-Masoud Eftekhari-Moghadam,et al.  Fuzzy rule-based reasoning approach for event detection and annotation of broadcast soccer video , 2013, Appl. Soft Comput..

[58]  Luis Torres,et al.  Automatic summarization of soccer highlights using audio-visual descriptors , 2015, SpringerPlus.

[59]  Jungsoo Lee,et al.  Soccer event recognition technique based on pattern matching , 2017, 2017 Federated Conference on Computer Science and Information Systems (FedCSIS).

[60]  Christoph Meinel,et al.  Image Captioning with Deep Bidirectional LSTMs and Multi-Task Learning , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[61]  David Zhang,et al.  A Survey of Sparse Representation: Algorithms and Applications , 2015, IEEE Access.

[62]  Shaogang Gong,et al.  Video Behavior Profiling for Anomaly Detection , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Said Jai-Andaloussi,et al.  Soccer Events Summarization by Using Sentiment Analysis , 2015, 2015 International Conference on Computational Science and Computational Intelligence (CSCI).

[64]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[65]  Keeseong Cho,et al.  Extraction of visual information in basketball broadcasting video for event segmentation system , 2016, 2016 International Conference on Information and Communication Technology Convergence (ICTC).

[66]  Lei Zhang,et al.  Metaface learning for sparse representation based face recognition , 2010, 2010 IEEE International Conference on Image Processing.

[67]  Ming Tong,et al.  A novel framework for soccer goal detection based on semantic rule , 2011 .

[68]  Hani Hagras,et al.  A type-2 fuzzy logic system for event detection in soccer videos , 2017, 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[69]  C. Krishna Mohan,et al.  Sparsity-inducing dictionaries for effective action classification , 2016, Pattern Recognit..