Enhancement of mismatched conditions in speaker recognition for multimedia applications

The paper investigates the performance of an HMM-based text-independent speaker recognition system under different model and feature combinations for matched and mismatched speech coding conditions. The effects of changing the HMM topology and acoustic features is first investigated. Training and testing the models using only the voiced segments of the samples is then considered. The best model structure in each topology is then used to test the effects of speech codecs like G729 at 8 kb/s and G723.1 at 5.3 and 6.3 kb/s, used in multimedia applications, on the performance of both matched and mismatched conditions. To improve the performance in mismatched conditions, a MAP-based adaptation with different amounts of coded training data and a diagonal affine transform for adapting the coded cepstral features to the original PCM cepstral features are investigated. Results show that the proposed techniques improve speaker recognition performance and produce comparable results to the matched condition test.