A Two Stage Procedure for Phone Based Speaker Verfication

Many approaches to speaker recognition have traditionally been based more or less directly on techniques borrowed from speech recognition, eg. Hidden Markov Models. These approaches ignore that the two problems are actually very different. Ideally speech recognition deals only with linguistic features, whereas speaker recognition deals only with non-linguistic features. It is not, however, possible to separate the two; when a sentence is uttered, the non-linguistic speaker information is observed in connection with the linguistic information. This is why a speech recogniser can be used also as a speaker recogniser. In this paper, a two stage procedure for speaker verification is presented. In this procedure, speech recognition (segmentation) and speaker verification are carried out separately. In the first stage, Hidden Markov Models are used for identifying phone segments, and in the second stage, phone dependent Radial Basis Function networks are used for verifying the claimed speaker identity. Phone modelling is important, because different phones characterise different aspects of a speaker. It is found here that phone modelling makes it easier to reject impostors, because successful impostors are usually only successful for specific phones.