Many approaches to speaker recognition have traditionally been based more or less directly on techniques borrowed from speech recognition, eg. Hidden Markov Models. These approaches ignore that the two problems are actually very different. Ideally speech recognition deals only with linguistic features, whereas speaker recognition deals only with non-linguistic features. It is not, however, possible to separate the two; when a sentence is uttered, the non-linguistic speaker information is observed in connection with the linguistic information. This is why a speech recogniser can be used also as a speaker recogniser. In this paper, a two stage procedure for speaker verification is presented. In this procedure, speech recognition (segmentation) and speaker verification are carried out separately. In the first stage, Hidden Markov Models are used for identifying phone segments, and in the second stage, phone dependent Radial Basis Function networks are used for verifying the claimed speaker identity. Phone modelling is important, because different phones characterise different aspects of a speaker. It is found here that phone modelling makes it easier to reject impostors, because successful impostors are usually only successful for specific phones.
[1]
Richard J. Mammone,et al.
Application of phonetic weighting to the neural tree network based speaker recognition system
,
1995,
EUROSPEECH.
[2]
Aaron E. Rosenberg,et al.
Experiments in automatic talker verification using sub-word unit hidden Markov models
,
1990,
ICSLP.
[3]
Bruce W. Suter,et al.
The multilayer perceptron as an approximation to a Bayes optimal discriminant function
,
1990,
IEEE Trans. Neural Networks.
[4]
David J. Hand,et al.
Discrimination and Classification
,
1982
.
[5]
R. Fisher.
THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS
,
1936
.
[6]
J. Oglesby,et al.
Radial basis function networks for speaker recognition
,
1991,
[Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.
[7]
Sadaoki Furui,et al.
Phoneme-level voice individuality used in speaker recognition
,
1994,
ICSLP.
[8]
J.P. Eatock,et al.
A quantitative assessment of the relative speaker discriminating properties of phonemes
,
1994,
Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.