Speaker Verification using Hidden Markov Models in a Multilingual Text-constrained Framework

This paper expands upon previous work, making use of a multilingual framework for text-constrained speaker verification. The framework attempts to overcome some of the restrictions found with previously developed monolingual text-constrained techniques. Pseudo-syllabic segmentation is used in order to extract regions for the constrained recognition. In this study, a comparison between Gaussian mixture models and hidden Markov models is presented for modelling these syllabic events. Results are presented for the NIST 2004 speaker recognition evaluation corpus. The results suggest that temporal patterns within the frame sequences are present and able to be exploited through use of Markovian modelling. The HMM based system is also compared against a traditional global acoustic GMM-UBM speaker verification system, with encouraging results presented