An Adaptive BIC Approach for Robust Speaker Change Detection in Continuous Audio Streams

In this paper we focus on an audio segmentation. We present a novel method for robust and accurate detection of acoustic change points in continuous audio streams. The presented segmentation procedure was developed as a part of an audio diarization system for broadcast news audio indexing. In the presented approach, we tried to remove a need for using pre-determined decision-thresholds for detecting of segment boundaries, which are usually the case in the standard segmentation procedures. The proposed segmentation aims to estimate decision-thresholds directly from the currently processed audio data and thus reduces a need for additional threshold tuning from development data. It employs change-detection methods from two well-established audio segmentation approaches based on the Bayesian Information Criterion. Combining methods from both approaches enabled us to adaptively tune boundary-detection thresholds from the underlying processing data. All three segmentation procedures are tested and compared on a broadcast news audio database, where our proposed audio segmentation procedure shows its potential.

[1]  David S. Pallett,et al.  Automatic transcription of Broadcast News data , 2002, Speech Commun..

[2]  Jean-François Bonastre,et al.  Evolutive HMM for multi-speaker tracking system , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  Mark J. F. Gales,et al.  Automatic transcription of Broadcast News , 2002, Speech Commun..

[4]  Ramesh A. Gopinath,et al.  Improved speaker segmentation and segments clustering using the bayesian information criterion , 1999, EUROSPEECH.

[5]  Michael I. Posner,et al.  Cognition (2nd ed.). , 1987 .

[6]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Hervé Bourlard,et al.  Robust speaker change detection , 2004, IEEE Signal Processing Letters.

[8]  Dan Istrate,et al.  Broadcast news speaker tracking for ESTER 2005 campaign , 2005, INTERSPEECH.

[9]  João Paulo da Silva Neto,et al.  The COST278 broadcast news segmentation and speaker clustering evaluation - overview, methodology, systems, results , 2005, INTERSPEECH.

[10]  Christian Wellekens,et al.  DISTBIC: A speaker-based segmentation for audio data indexing , 2000, Speech Commun..

[11]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[12]  Alexander H. Waibel,et al.  Strategies for automatic segmentation of audio data , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[13]  M. A. Siegler,et al.  Automatic Segmentation, Classification and Clustering of Broadcast News Audio , 1997 .

[14]  France Mihelic,et al.  Development of Slovenian Broadcast News Speech Database , 2004, LREC.