Using prosodic and lexical information for speaker identification

We investigate the incorporation of larger time-scale information, such as prosody, into standard speaker ID systems. Our study is based on the Extended Data Task of the NIST 2001 Speaker ID evaluation, which provides much more test and training data than has traditionally been available to similar speaker ID investigations. In addition, we have had access to a detailed prosodic feature database of Switchboard-I conversations, including data not previously applied to speaker ID. We describe two baseline acoustic systems, an approach using Gaussian Mixture Models, and an LVCSR-based speaker ID system. These results are compared to and combined with two larger time-scale systems: a system based on an “idiolect” language model. and a system making use of the contents of the prosody database. We find that, with sufficient test and training data, suprasegmental information can significantly enhance the performance of traditional speaker ID systems.

[1]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[2]  Larry Gillick,et al.  Speaker Recognition on Single- and Multispeaker Data , 2000, Digit. Signal Process..

[3]  Douglas A. Reynolds,et al.  Comparison of background normalization methods for text-independent speaker verification , 1997, EUROSPEECH.

[4]  George R. Doddington,et al.  Speaker recognition based on idiolectal differences between speakers , 2001, INTERSPEECH.

[5]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[6]  Larry P. Heck,et al.  Modeling dynamic prosodic variation for speaker verification , 1998, ICSLP.

[7]  B. Atal Automatic Speaker Recognition Based on Pitch Contours , 1969 .

[8]  Gökhan Tür,et al.  Prosody-based automatic segmentation of speech into sentences and topics , 2000, Speech Commun..

[9]  B. Belkhouche,et al.  Acknowledgements We Would like to Thank , 1993 .

[10]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[11]  Don McAllaster,et al.  Improvements in recognition of conversational telephone speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[12]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.