Replay Attack Detection Using Linear Prediction Analysis-Based Relative Phase Features

Recent studies have reported the success of linear prediction analysis (LPA)-related features, which are extracted as a short-term spectral feature for replay attack detection due to the advantage of the imperfection in the LPA-based signal produced by recording and playback devices. However, exploiting LPA-based signals is focused on only magnitude-based features and ignores phase-based features. In this paper, we propose two novel LPA-based relative phase features, namely, linear prediction residual-based relative phase (LPR-RP) and linear prediction analysis estimated speech-based relative phase (LPAES-RP). The key idea of both LPR-RP and LPAES-RP is to extract the phase information based on LPA-based signals. In the LPR-RP feature, we modify the relative phase (RP) feature extraction using a linear prediction residual (LPR) derived from the difference between the original/raw speech and LPA estimated speech signal (LPAES) instead of the original/raw speech signal. LPES-RP feature exploits the LPAES signal to replace the original/raw speech signal. Because the trace of the recording and playback device artifacts is the primary evidence for detecting the replayed signal, the advantages of the imperfection of LPR and LPAES are expected to provide efficient phase information for the replay attack detection task. In addition, using the individual LPR-RP/LPAES-RP feature, our proposed features are combined with two standard features, mel-frequency cepstral coefficients (MFCC), constant Q transform cepstral coefficients (CQCC) and the original RP feature, at score level to further improve the detection decision. The performance of the proposed LPR-RP/LPAES-RP feature and combination are evaluated using the ASVspoof 2017 version 2 database. On the evaluation subset, our proposed LPR-RP and LPAES-RP feature achieves a promising improvement over baseline features (MFCC/CQCC). Moreover, the combined systems of LPR-RP, RP, and CQCC obtains an equal error rate of 9.26%.

[1]  Simon King,et al.  Attentive Filtering Networks for Audio Replay Attack Detection , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Tomi Kinnunen,et al.  Spoofing and countermeasures for automatic speaker verification , 2013, INTERSPEECH.

[3]  Haizhou Li,et al.  Extended Constant-Q Cepstral Coefficients for Detection of Spoofing Attacks , 2018, 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[4]  Tomi Kinnunen,et al.  Further optimisations of constant Q cepstral processing for integrated utterance and text-dependent speaker verification , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[5]  María José Cano,et al.  Experimental Analysis of Features for Replay Attack Detection - Results on the ASVspoof 2017 Challenge , 2017, INTERSPEECH.

[6]  Jakub Galka,et al.  Audio Replay Attack Detection Using High-Frequency Features , 2017, INTERSPEECH.

[7]  Hemlata Tak,et al.  Novel Linear Frequency Residual Cepstral Features for Replay Attack Detection , 2018, INTERSPEECH.

[8]  Prasenjit Dey,et al.  End-To-End Audio Replay Attack Detection Using Deep Convolutional Networks with Attention , 2018, INTERSPEECH.

[9]  Hemant A. Patil,et al.  Combining Phase-based Features for Replay Spoof Detection System , 2018, 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP).

[10]  Longbiao Wang,et al.  Replay Attack Detection Using Magnitude and Phase Information with Attention-based Adaptive Filters , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Madhu R. Kamble,et al.  Replay Spoof Detection using Power Function Based Features , 2018, 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[12]  Longbiao Wang,et al.  Noise robust voice activity detection using joint phase and magnitude based feature enhancement , 2017, Journal of Ambient Intelligence and Humanized Computing.

[13]  Zhifeng Xie,et al.  Recurrent Neural Networks for Automatic Replay Spoofing Attack Detection , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Vidhyasaharan Sethu,et al.  Deep Siamese Architecture Based Replay Detection for Secure Voice Biometric , 2018, INTERSPEECH.

[15]  Rohan Kumar Das,et al.  Low frequency frame-wise normalization over constant-Q transform for playback speech detection , 2019, Digit. Signal Process..

[16]  Nicholas W. D. Evans,et al.  An end-to-end spoofing countermeasure for automatic speaker verification using evolving recurrent neural networks , 2018, Odyssey.

[17]  Cemal Hanilçi,et al.  Linear prediction residual features for automatic speaker verification anti-spoofing , 2018, Multimedia Tools and Applications.

[18]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[19]  Nicholas W. D. Evans,et al.  A New Feature for Automatic Speaker Verification Anti-Spoofing: Constant Q Cepstral Coefficients , 2016, Odyssey.

[20]  Bob L. Sturm,et al.  ANALYSING REPLAY SPOOFING COUNTERMEASURE PERFORMANCE UNDER VARIED CONDITIONS , 2018, 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP).

[21]  P. Satheesh,et al.  SPEAKER RECOGNITION USING GMM , 2010 .

[22]  Hemant A. Patil,et al.  Novel Empirical Mode Decomposition Cepstral Features for Replay Spoof Detection , 2018, INTERSPEECH.

[23]  Jichen Yang,et al.  Playback speech detection based on magnitude–phase spectrum , 2018, Electronics Letters.

[24]  Jianwu Dang,et al.  Replay Attacks Detection Using Phase and Magnitude Features with Various Frequency Resolutions , 2018, 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP).

[25]  Jianwu Dang,et al.  Multiple Phase Information Combination for Replay Attacks Detection , 2018, INTERSPEECH.

[26]  Longbiao Wang,et al.  Speaker Identification and Verification by Combining MFCC and Phase Information , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Georg Heigold,et al.  End-to-end text-dependent speaker verification , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Bob L. Sturm,et al.  A Study On Convolutional Neural Network Based End-To-End Replay Anti-Spoofing , 2018, ArXiv.

[29]  S. R. Mahadeva Prasanna,et al.  Spoof Detection Using Source, Instantaneous Frequency and Cepstral Features , 2017, INTERSPEECH.

[30]  Longbiao Wang,et al.  Replay attack detection with auditory filter-based relative phase features , 2019, EURASIP J. Audio Speech Music. Process..

[31]  Hema A. Murthy,et al.  Decision-level Feature Switching as a Paradigm for Replay Attack Detection , 2018, INTERSPEECH.

[32]  Kong-Aik Lee,et al.  ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements , 2018, Odyssey.

[33]  Madhusudan Singh,et al.  Linear Prediction Residual based Short-term Cepstral Features for Replay Attacks Detection , 2018, INTERSPEECH.

[34]  Nasir D. Memon How Biometric Authentication Poses New Challenges to Our Security and Privacy [In the Spotlight] , 2017, IEEE Signal Process. Mag..

[35]  Haizhou Li,et al.  Instantaneous Phase and Excitation Source Features for Detection of Replay Attacks , 2018, 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[36]  Tomi Kinnunen,et al.  Transforming acoustic characteristics to deceive playback spoofing countermeasures of speaker verification systems , 2018, 2018 IEEE International Workshop on Information Forensics and Security (WIFS).

[37]  Hemant A. Patil,et al.  Significance of Teager Energy Operator Phase for Replay Spoof Detection , 2018, 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[38]  Haizhou Li,et al.  A study on replay attack and anti-spoofing for text-dependent speaker verification , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[39]  Galina Lavrentyeva,et al.  Audio Replay Attack Detection with Deep Learning Frameworks , 2017, INTERSPEECH.

[40]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[41]  Nicholas W. D. Evans,et al.  Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification , 2017, Comput. Speech Lang..

[42]  Hemant A. Patil,et al.  Novel Nonlinear Prediction Based Features for Spoofed Speech Detection , 2016, INTERSPEECH.

[43]  Hemant A. Patil,et al.  Relative Phase Shift Features for Replay Spoof Detection System , 2018, SLTU.

[44]  Haizhou Li,et al.  Spoofing and countermeasures for speaker verification: A survey , 2015, Speech Commun..

[45]  Madhu R. Kamble,et al.  Novel Amplitude Weighted Frequency Modulation Features for Replay Spoof Detection , 2018, 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP).

[46]  Jagabandhu Mishra,et al.  LP residual features to counter replay attacks , 2018, 2018 International Conference on Signals and Systems (ICSigSys).

[47]  Tomi Kinnunen,et al.  A comparison of features for synthetic speech detection , 2015, INTERSPEECH.

[48]  S. R. Mahadeva Prasanna,et al.  Extraction of speaker-specific excitation information from linear prediction residual of speech , 2006, Speech Commun..

[49]  Longbiao Wang,et al.  Relative phase information for detecting human speech and spoofed speech , 2015, INTERSPEECH.

[50]  Madhusudan Singh,et al.  Combining evidences from Hilbert envelope and residual phase for detecting replay attacks , 2019, Int. J. Speech Technol..

[51]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[52]  Kong-Aik Lee,et al.  The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection , 2017, INTERSPEECH.

[53]  Sharath Pankanti,et al.  Biometrics: a tool for information security , 2006, IEEE Transactions on Information Forensics and Security.

[54]  S. R. Mahadeva Prasanna,et al.  Combining source and system information for limited data speaker verification , 2014, INTERSPEECH.

[55]  Longbiao Wang,et al.  Spoofing Speech Detection Using Modified Relative Phase Information , 2017, IEEE Journal of Selected Topics in Signal Processing.