Automatic Compositing Soccer Video Highlights with Core-Around Event Model

This paper presents an automatic video highlighting approach in relation to a soccer match video lasting over ninety minutes, in which only the match video and the target length of the highlight video are required for the composition. Firstly, we propose a core-around event model, which represents the highlight as a complex event and consists of three components: semantic relations, temporal relationship among the activities and local motion appearance of each activity. Secondly, we detect activities involved in a highlight in the soccer match video by the local motion appearance. Thirdly, a candidate highlight video segment is aligned with the model by a specific kernel activity extracted on the basis of the semantic relations, and then assigned a matching score. Finally, we splice together the highlight segments of high matching score. Experimental results demonstrate that the composed highlight videos are attractive episodes of soccer matches, and similar to those edited by professional editors.

[1]  Nebojsa Jojic,et al.  Interactive Montages of Sprites for Indexing and Summarizing Security Video , 2005, CVPR.

[2]  Cristian Sminchisescu,et al.  Object recognition as ranking holistic figure-ground hypotheses , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Jeffrey M. Zacks,et al.  Human brain activity time-locked to perceptual event boundaries , 2001, Nature Neuroscience.

[4]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[5]  R. Kumar,et al.  Video abstraction: summarizing video content for retrieval and visualization , 1998, Conference Record of Thirty-Second Asilomar Conference on Signals, Systems and Computers (Cat. No.98CH36284).

[6]  Jenq-Neng Hwang,et al.  An integrated scheme for object-based video abstraction , 2000, ACM Multimedia.

[7]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[9]  Larry S. Davis,et al.  Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos , 2009, CVPR.

[10]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  P. Anandan,et al.  Efficient representations of video sequences and their applications , 1996, Signal Process. Image Commun..

[12]  Juan Carlos Niebles,et al.  Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[13]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[14]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL 2006.

[15]  Yang Wang,et al.  A Discriminative Latent Model of Object Classes and Attributes , 2010, ECCV.

[16]  Andrew Zisserman,et al.  Structured output regression for detection with partial truncation , 2009, NIPS.

[17]  Yang Wang,et al.  Recognizing human actions from still images with latent poses , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Jianping Fan,et al.  Exploring video content structure for hierarchical summarization , 2004, Multimedia Systems.

[19]  Charless C. Fowlkes,et al.  Discriminative models for static human-object interactions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[20]  Yael Pritch,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008 1 Non-Chronological Video , 2022 .