Tennis video abstraction from audio and visual cues

We propose a context-based model of video abstraction exploiting both audio and video features and applied to tennis TV programs. We can automatically produce different types of summary of a given video depending on the users' constraints or preferences. We have first designed an efficient and accurate temporal segmentation of the video into segments homogeneous w.r.t the camera motion. We introduce original visual descriptors related to the dominant and residual image motions. The different summary types are obtained by specifying adapted classification criteria which involve audio features to select the relevant segments to be included in the video abstract. The proposed scheme has been validated on 22 hours of tennis videos.

[1]  Patrick Bouthemy,et al.  A unified approach to shot change detection and camera motion characterization , 1999, IEEE Trans. Circuits Syst. Video Technol..

[2]  A. Murat Tekalp,et al.  Automatic Soccer Video Analysis and Summarization , 2003, IS&T/SPIE Electronic Imaging.

[3]  Riccardo Leonardi,et al.  Semantic indexing of soccer audio-visual sequences: a multimodal approach based on controlled Markov chains , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Jean-Marc Odobez,et al.  Robust Multiresolution Estimation of Parametric Motion Models , 1995, J. Vis. Commun. Image Represent..

[5]  Shih-Fu Chang,et al.  Structure analysis of sports video using domain models , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[6]  Alan Hanjalic,et al.  Generic approach to highlights extraction from a sport video , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).