Sub-word speaker verification using data fusion methods

Speaker verification is a rapidly maturing technology that is becoming available for commercial applications. In this paper, we investigate the application of data fusion methods to sub-word implementations of speaker verification. At a sub-word level, we utilize the diversity of the information provided by the neural tree network and Gaussian mixture model to provide a more robust sub-word model. The phrase-level scores for each modeling approach are obtained and then combined. The data fusion method we use for combining the model scores is the linear opinion pool. In addition to using the diversity of the model scores, we also apply the concept of redundancy by using a leave-one-out approach to partition the input data. This allows us to generate several models and accommodate the small training sample issues imposed by our specific applications. The theoretical results of the above analysis have been integrated into a system that has been tested with several databases that were collected within landline and cellular environments. These results are included in this paper. We have found that the proper data fusion techniques will typically reduce the error rate by a factor of two.

[1]  Kevin R. Farrell Text-dependent speaker verification using data fusion , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Richard J. Mammone,et al.  Text-dependent speaker verification using subword neural tree networks , 1994, Optics & Photonics.

[3]  Jon Atli Benediktsson,et al.  Consensus theoretic classification methods , 1992, IEEE Trans. Syst. Man Cybern..

[4]  Manish Sharma,et al.  "Blind" speech segmentation: automatic segmentation of speech without linguistic knowledge , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Manish Sharma,et al.  Subword-based text-dependent speaker verification system with user-selectable passwords , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[6]  Devang Naik,et al.  Pole-filtered cepstral mean subtraction , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[7]  Richard J. Mammone,et al.  Growing and Pruning Neural Tree Networks , 1993, IEEE Trans. Computers.

[8]  K. R. Farrell Discriminatory measures for speaker recognition , 1995, Proceedings of 1995 IEEE Workshop on Neural Networks for Signal Processing.

[9]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[10]  Richard J. Mammone,et al.  Speaker recognition using neural networks and conventional classifiers , 1994, IEEE Trans. Speech Audio Process..

[11]  Lawrence G. Bahler,et al.  Speaker verification using randomized phrase prompting , 1991, Digit. Signal Process..

[12]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[13]  Biing-Hwang Juang,et al.  Speaker recognition based on minimum error discriminative training , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.