Detection of ball hits in a tennis game using audio and visual information

In this paper we describe a framework to improve the detection of ball hit events in tennis games by combining audio and visual information. Detection of the presence and timing of these events is crucial for the understanding of the game. However, neither modality on its own gives satisfactory results: audio information is often corrupted by noise and also suffers from acoustic mismatch between the training and test data, and visual information is corrupted by complex backgrounds, camera calibration, and the presence of multiple moving objects. Our approach is to first attempt to track the ball visually and hence estimate a sequence of candidate positions for the ball, and to then locate putative ball hits by analysing the ball's position in this trajectory. To handle the severe interferences caused by false ball candidates, we smooth the trajectory by using locally weighted linear regression and removing the frames where there are no candidates. We use Gaussian mixture models to generate estimates of the times of hits using the audio information, and then integrate these two sources of information in a probabilistic framework. Testing our approach on three complete tennis games shows significant improvements in detection over a range of conditions when compared with using a single modality.

[1]  Ricardo M. L. Barros,et al.  Tracking soccer players aiming their kinematical motion analysis , 2006, Comput. Vis. Image Underst..

[2]  Lie Lu,et al.  Towards a unified framework for content-based audio analysis , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[3]  Anil C. Kokaram,et al.  Joint audio visual retrieval for tennis broadcasts , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[4]  David Windridge,et al.  Improved detection of ball hit events in a tennis game using multimodal information , 2011, AVSP.

[5]  Patrick Gros,et al.  Audiovisual integration for tennis broadcast structuring , 2006, Multimedia Tools and Applications.

[6]  Qiang Huang,et al.  Hierarchical language modeling for audio events detection in a sports game , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Wen Gao,et al.  A Scheme for Ball Detection and Tracking in Broadcast Soccer Video , 2005, PCM.

[8]  Hisashi Miyamori Automatic Annotation of Tennis Action for Content-Based Retrieval by Integrated Audio and Visual Information , 2003, CIVR.

[9]  Qingshan Liu,et al.  An effective and fast soccer ball detection and tracking method , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[10]  Thomas S. Huang,et al.  Feature analysis and selection for acoustic event detection , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Trajectory-Based BallDetection andTracking inBroadcast Soccer Video withtheAidofCameraMotionRecovery , 2007 .

[12]  Mohan S. Kankanhalli,et al.  Audio Based Event Detection for Multimedia Surveillance , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[13]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[14]  Qi Tian,et al.  Trajectory-based ball detection and tracking with applications to semantic analysis of broadcast soccer video , 2003, MULTIMEDIA '03.

[15]  Jungong Han,et al.  Ball-path inference based on a combination of audio and video clues in tennis video sequences , 2006 .

[16]  W. Cleveland LOWESS: A Program for Smoothing Scatterplots by Robust Locally Weighted Regression , 1981 .

[17]  Lie Lu,et al.  Highlight sound effects detection in audio stream , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[18]  Yves Jean,et al.  Ball tracking and virtual replays for innovative tennis broadcasts , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[19]  N. Vincent,et al.  3 classes segmentation for analysis of football audio sequences , 2002, 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628).

[20]  Lawrence R. Rabiner,et al.  A tutorial on Hidden Markov Models , 1986 .

[21]  Xinguo Yu,et al.  Trajectory-Based Ball Detection and Tracking in Broadcast Soccer Video with the Aid of Camera Motion Recovery , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[22]  Lie Lu,et al.  Content analysis for audio classification and segmentation , 2002, IEEE Trans. Speech Audio Process..

[23]  William J. Christmas,et al.  A Maximum A Posteriori Probability Viterbi Data Association Algorithm for Ball Tracking in Sports Video , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[24]  William J. Christmas,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Layered Data Association Using Graph-theoretic Formulation with Application to Tennis Ball Tracking in Monocular Sequences , 2022 .

[25]  KYUHYOUNG CHOI,et al.  Tracking the Ball and Players from Multiple Football Videos , 2006, Int. J. Inf. Acquis..

[26]  Loong Fah Cheong,et al.  A trajectory-based ball detection and tracking algorithm in broadcast tennis video , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[27]  Wei-Ta Chu,et al.  Event detection in tennis matches based on video data mining , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[28]  Qiang Huang,et al.  Using high-level information to detect key audio events in a tennis game , 2010, INTERSPEECH.