Application of Acoustic Discriminative Training in an Ergodic HMM for Speaker Identification

We present a novel architecture for a Speaker Recognition system over the telephone. The proposed system introduces acoustic information into a HMM-based recognizer. This is achieved by using a phonetic classifier during the training phase. Three broad phonetic classes: voiced frames, unvoiced frames and transitions, are defined. We design speaker templates by the combination of four single state HMMs into a four state HMM after re-estimation of the transition probabilities. Experiments conducted with two databases are reported, and the results show that this architecture performs better than others without phonetic classification.