Speaker recognition is used to identify a speaker's voice from among a group of known speakers. A common method of speaker recognition is a classification based on cepstral coefficients of the speaker's voice, using a Gaussian mixture model (GMM) to model each speaker. In this paper we try to fool a speaker recognition system using additive noise such that an intruder is recognized as a target user. Our attack uses a mixture selected from a target user's GMM model, inverting the cepstral transformation to produce noise samples. In our 5 speaker data base, we achieve an attack success rate of 50% with a noise signal at 10dB SNR, and 95% by increasing noise power to 0dB SNR. The importance of this attack is its simplicity and flexibility: it can be employed in real time with no processing of an attacker's voice, and little computation is needed at the moment of detection, allowing the attack to be performed by a small portable device. For any target user, knowing that user's model or voice sample is sufficient to compute the attack signal, and it is enough that the intruder plays it while he/she is uttering to be classiffed as the victim.
[1]
Jeff A. Bilmes,et al.
A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models
,
1998
.
[2]
Douglas A. Reynolds,et al.
Robust text-independent speaker identification using Gaussian mixture speaker models
,
1995,
IEEE Trans. Speech Audio Process..
[3]
Douglas A. Reynolds,et al.
Speaker Verification Using Adapted Gaussian Mixture Models
,
2000,
Digit. Signal Process..
[4]
R Togneri,et al.
An Overview of Speaker Identification: Accuracy and Robustness Issues
,
2011,
IEEE Circuits and Systems Magazine.