论文信息 - Replay Attack Detection Using DNN for Channel Discrimination

Replay Attack Detection Using DNN for Channel Discrimination

Voice is projected to be the next input interface for portable devices. The increased use of audio interfaces can be mainly attributed to the success of speech and speaker recognition technologies. With these advances comes the risk of criminal threats where attackers are reportedly trying to access sensitive information using diverse voice spoofing techniques. Among them, replay attacks pose a real challenge to voice biometrics. This paper addresses the problem by proposing a deep learning architecture in tandem with low-level cepstral features. We investigate the use of a deep neural network (DNN) to discriminate between the different channel conditions available in the ASVSpoof 2017 dataset, namely recording, playback and session conditions. The high-level feature vectors derived from this network are used to discriminate between genuine and spoofed audio. Two kinds of low-level features are utilized: state-ofthe-art constant-Q cepstral coefficients (CQCC), and our proposed high-frequency cepstral coefficients (HFCC) that derive from the high-frequency spectrum of the audio. The fusion of both features proved to be effective in generalizing well across diverse replay attacks seen in the evaluation of the ASVSpoof 2017 challenge, with an equal error rate of 11.5%, that is 53% better than the baseline Gaussian Mixture Model (GMM) applied on CQCC.

[1] Kong-Aik Lee,et al. The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection , 2017, INTERSPEECH.

[2] Aleksandr Sizov,et al. ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge , 2017, IEEE Journal of Selected Topics in Signal Processing.

[3] Sébastien Marcel,et al. Cross-Database Evaluation of Audio-Based Spoofing Detection Systems , 2016, INTERSPEECH.

[4] Kong-Aik Lee,et al. RedDots replayed: A new replay spoofing attack corpus for text-dependent speaker verification research , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5] Sébastien Marcel,et al. Presentation Attack Detection Using Long-Term Spectral Statistics for Trustworthy Speaker Verification , 2016, 2016 International Conference of the Biometrics Special Interest Group (BIOSIG).

[6] Aleksandr Sizov,et al. Joint Speaker Verification and Antispoofing in the $i$ -Vector Space , 2015, IEEE Transactions on Information Forensics and Security.

[7] Nicholas W. D. Evans,et al. Re-assessing the threat of replay spoofing attacks against automatic speaker verification , 2014, 2014 International Conference of the Biometrics Special Interest Group (BIOSIG).

[8] Tomi Kinnunen,et al. A comparison of features for synthetic speech detection , 2015, INTERSPEECH.

[9] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[10] Avery Wang,et al. An Industrial Strength Audio Search Algorithm , 2003, ISMIR.

[11] Eduardo Lleida,et al. Preventing replay attacks on speaker verification systems , 2011, 2011 Carnahan Conference on Security Technology.

[12] Haizhou Li,et al. A study on replay attack and anti-spoofing for text-dependent speaker verification , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[13] Tomi Kinnunen,et al. Integrated Spoofing Countermeasures and Automatic Speaker Verification: An Evaluation on ASVspoof 2015 , 2016, INTERSPEECH.

[14] Goutam Saha,et al. Overview of BTAS 2016 speaker anti-spoofing competition , 2016, 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[15] Sébastien Marcel,et al. On the vulnerability of speaker verification to realistic voice spoofing , 2015, 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[16] Bin Ma,et al. The reddots data collection for speaker recognition , 2015, INTERSPEECH.

[17] Nicholas W. D. Evans,et al. A New Feature for Automatic Speaker Verification Anti-Spoofing: Constant Q Cepstral Coefficients , 2016, Odyssey.