Enhanced speaker recognition based on score level fusion of AHS and HMM

Speaker recognition history dates back to some four decades, and yet it has not been reliable enough to be considered as a standalone security system. This paper focuses on the enhancement of speaker recognition through fusion of likelihood scores generated by arithmetic harmonic sphericity (AHS) and hidden Markov model (HMM) techniques. Due to the contrastive nature of AHS and HMM, we have observed a significant performance improvement of 22% and 6% true acceptance rate at 5% false acceptance rate, when this fusion technique was evaluated on two different datasets - YOHO and USF multimodal biometric dataset, respectively. Performance enhancement has been achieved on both the datasets, however performance on YOHO was comparatively higher than that on USF dataset, owing to the fact that USF dataset is a noisy outdoor dataset whereas YOHO is an indoor dataset.

[1]  Frédéric Bimbot,et al.  Text-free speaker recognition using an arithmetic-harmonic sphericity measure , 1993, EUROSPEECH.

[2]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Wei Zhang,et al.  Text-independent speaker recognition by combining speaker-specific GMM with speaker adapted syllable-based HMM , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Douglas A. Reynolds,et al.  Integration of speaker recognition into conversational spoken dialogue systems , 2003, INTERSPEECH.

[5]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[6]  Arun Ross,et al.  Score normalization in multimodal biometric systems , 2005, Pattern Recognit..

[7]  Joseph P. Campbell,et al.  Testing with the YOHO CD-ROM voice verification corpus , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[8]  H. Gish,et al.  Text-independent speaker identification , 1994, IEEE Signal Processing Magazine.

[9]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[10]  Sudeep Sarkar,et al.  An outdoor biometric system: evaluation of normalization fusion schemes for face and voice , 2006, SPIE Defense + Commercial Sensing.

[11]  Alex Park,et al.  ASR dependent techniques for speaker identification , 2002, INTERSPEECH.