Speaker detection without models

In order to capture sequential information and to take advantage of extended training data conditions, we developed an algorithm for speaker detection that scores a test segment by comparing it directly to similar instances of that speech in the training data. This non-parametric technique, though at an early stage in its development, achieves error rates close to 1% on the NIST 2001 extended data task and performs extremely well in combination with a standard Gaussian mixture model system. We also present a new scoring method that significantly improves performance by capturing only positive evidence.

[1]  Lawrence G. Bahler,et al.  Voice identification using nearest-neighbor distance measure , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[3]  Larry Gillick,et al.  Progress in speaker recognition at dragon systems , 1998, ICSLP.

[4]  Andreas Stolcke,et al.  THE SRI MARCH 2000 HUB-5 CONVERSATIONAL SPEECH TRANSCRIPTION SYSTEM , 2000 .

[5]  George R. Doddington,et al.  Speaker recognition based on idiolectal differences between speakers , 2001, INTERSPEECH.

[6]  Douglas A. Reynolds,et al.  The SuperSID project: exploiting high-level information for high-accuracy speaker recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[7]  Patrick Wambacq,et al.  Data driven example based continuous speech recognition , 2003, INTERSPEECH.

[8]  Douglas A. Reynolds,et al.  Channel robust speaker verification via feature mapping , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  Andreas Stolcke,et al.  Modeling duration patterns for speaker recognition , 2003, INTERSPEECH.

[10]  Hagai Aronowitz,et al.  Text independent speaker recognition using speaker dependent word spotting , 2004, INTERSPEECH.

[11]  S. Axelrod,et al.  Combination of hidden Markov models with dynamic time warping for speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.