Automatic Real-Time Selection and Annotation of Highlight Scenes in Televised Soccer

We describe an online method for selecting and annotating highlight scenes in soccer matches being televised. The stadium crowd noise and the play-by-play announcer's voice are used as input signals. Candidate scenes for highlights are extracted from the crowd noise by dynamic thresholding and spectral envelope analysis. Using a dynamic threshold solves the problem in conventional methods of how to determine an appropriate threshold. Semantic-meaning information about the kind of play and the related team and player is extracted from the announcer's commentary by using domain-based rules. The information extracted from the two types of audio input is integrated to generate segment-metadata of highlight scenes. Application of the method to six professional soccer games has confirmed its effectiveness.