SEGMENTING TV SERIES INTO SCENES USING SPEAKER DIARIZATION

In this paper, we propose a novel approach to perform scene segmentation of TV series. Using the output of our existing speaker diarization system, any temporal segment of the vid eo can be described as a binary feature vector. A straightforwa rd segmentation algorithm then allows to group similar contig uous speaker segments into scenes. An additional visual-onl y color-based segmentation is then used to refine the first segmentation. Experiments are performed on a subset of the Ally McBealTV series and show promising results, obtained with a rule-free and generic method. For comparison purposes, te st corpus annotations and description are made available to th e community.

[1]  Boon-Lock Yeo,et al.  Segmentation of Video by Clustering and Graph Analysis , 1998, Comput. Vis. Image Underst..

[2]  Yuncai Liu,et al.  Video scene segmentation and semantic representation using a novel scheme , 2009, Multimedia Tools and Applications.

[3]  Shih-Fu Chang,et al.  Structure analysis of soccer video with domain knowledge and hidden Markov models , 2004, Pattern Recognit. Lett..

[4]  Elie el Khoury,et al.  Speaker Diarization: Towards a More Robust and Portable System , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5]  Shih-Fu Chang,et al.  Computable scenes and structures in films , 2002, IEEE Trans. Multim..

[6]  Wallapak Tavanapong,et al.  Shot clustering techniques for story browsing , 2004, IEEE Transactions on Multimedia.