Speaker-based segmentation for audio data indexing

In this paper, we address the problem of the speakerbased segmentation, which is the first necessary step for several indexing tasks. It consists in recognizing from their voice the sequence of people engaged in a conversation. In our context, we make no assumptions about prior knowledge of the speaker characteristics (no speaker model, no speech model, no training phase). However, we assume that people do not speak simultaneously. Our segmentation technique takes advantages of two different types of segmentation algorithms. It is organized in two passes: first, the most likely speaker changing points are detected and then, they are validated or discarded. Our algorithm is efficient to detect speaker changing points even close to one another and is thus suited for segmenting conversations containing segments of any length.

[1]  Steve Young,et al.  The development of the 1996 HTK broadcast news transcription system , 1996 .

[2]  Douglas A. Reynolds,et al.  Blind clustering of speech utterances based on speaker and language characteristics , 1998, ICSLP.

[3]  H. Gish,et al.  Text-independent speaker identification , 1994, IEEE Signal Processing Magazine.

[4]  Herbert Gish,et al.  Segregation of speakers for speech recognition and speaker identification , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Philip C. Woodland,et al.  Speaker clustering using direct maximisation of the MLLR-adapted likelihood , 1998, ICSLP.

[6]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Aaron E. Rosenberg,et al.  Speaker detection in broadcast speech databases , 1998, ICSLP.

[8]  Ivan Magrin-Chagnolleau,et al.  Second-order statistical measures for text-independent speaker identification , 1995, Speech Commun..

[9]  Til T. Phan,et al.  Text-Independent Speaker Identification , 1999 .

[10]  Christian Wellekens,et al.  Audio data indexing: Use of second-order statistics for speaker-based segmentation , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[11]  M. A. Siegler,et al.  Automatic Segmentation, Classification and Clustering of Broadcast News Audio , 1997 .

[12]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[13]  Claude Montacié,et al.  Sound Channel Video Indexing , 1997, EUROSPEECH.