MultiBIC: an improved speaker segmentation technique for TV shows

Speaker segmentation systems usually have problems detecting short segments, which causes the number of deletions to be high and therefore harming the performance of the system. This is a complication when it comes to segmenting multimedia information such as movies and TV shows, where dialogs among characters are very common. In this paper a modification of the BIC algorithm is presented, which will reduce remarkably the number of deletions without causing an increase in the number of false alarms. This modification, referred to as MultiBIC, assumes that two change-points are present in a window of data, while conventional BIC approach supposes that there is just one. This causes the system to notice when there is more than one change-point in a window, finding shorter segments than traditional BIC.

[1]  David A. van Leeuwen,et al.  Automatic discrimination between laughter and speech , 2007, Speech Commun..

[2]  Hervé Bourlard,et al.  Robust speaker change detection , 2004, IEEE Signal Processing Letters.

[3]  Mauro Cettolo,et al.  Evaluation of BIC-based algorithms for audio segmentation , 2005, Comput. Speech Lang..

[4]  Christian Wellekens,et al.  DISTBIC: A speaker-based segmentation for audio data indexing , 2000, Speech Commun..

[5]  Douglas A. Reynolds,et al.  Approaches to Speaker Detection and Tracking in Conversational Speech , 2000, Digit. Signal Process..

[6]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[7]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[8]  Lie Lu,et al.  Speaker change detection and tracking in real-time news broadcasting analysis , 2002, MULTIMEDIA '02.

[9]  Carmen García-Mateo,et al.  Novel strategies for reducing the false alarm rate in a speaker segmentation system , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Carmen García-Mateo,et al.  An adaptive threshold computation for unsupervised speaker segmentation , 2009, INTERSPEECH.

[11]  Daniel P. W. Ellis,et al.  Laughter Detection in Meetings , 2004 .

[12]  Hisashi Aoki High-Speed Dialog Detection for Automatic Segmentation of Recorded TV Program , 2005, CIVR.

[13]  Carmen García-Mateo,et al.  A multimedia approach for audio segmentation in TV broadcast news , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.