Countermeasures to Replay Attacks: A Review

ABSTRACT Replay attack is an attempt of using pre-recorded speech samples of any target for acquiring unauthorized access to the automatic speaker verification (ASV) system. It is a low technology spoofing attack, requires only a high-quality recording and playback device, therefore, is the most accessible and highly effective approach of spoofing state-of-the-art ASV systems. Of late, the researchers are giving wide attention towards the development of replay attack countermeasures. This study provides a detailed review on the recently proposed replay attack detection methods. Different speech signal attributes, such as spectral magnitude features, modulation features, phase and excitation source features, have been explored for replay detection task. It is observed that the many proposed methods are performing well, but in light of day-by-day advancement in the device manufacturing technologies, even more effort is required towards the development of generalized replay attack countermeasures. In this study, we inferred that the exploration of excitation source information by suitable signal processing algorithms may be useful for replay detection task. They can also be used as complementary to spectral features to obtain generalized solutions. The LP residual signal represents excitation source information implicitly. As an immediate future scope, the potential of the LP residual signal information can be explored for the detection of replay signals. Alternatively, explicit excitation source information like pitch, epochstrength, and glottal flow derivative (GFD) signal can also be used for the detection of replay signals. This study is completed with the discussion on the frameworks for the proposed excitation information-based research directions.

[1]  Jichen Yang,et al.  Playback speech detection based on magnitude–phase spectrum , 2018, Electronics Letters.

[2]  Petros Maragos,et al.  On amplitude and frequency demodulation using energy operators , 1993, IEEE Trans. Signal Process..

[3]  Paavo Alku,et al.  Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering , 1991, Speech Commun..

[4]  Bayya Yegnanarayana,et al.  Single Frequency Filtering Approach for Discriminating Speech and Nonspeech , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[5]  Haizhou Li,et al.  Front-End for Antispoofing Countermeasures in Speaker Verification: Scattering Spectral Decomposition , 2017, IEEE Journal of Selected Topics in Signal Processing.

[6]  S. R. Mahadeva Prasanna,et al.  Sonority Measurement Using System, Source, and Suprasegmental Information , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7]  Longbiao Wang,et al.  Spoofing Speech Detection Using Modified Relative Phase Information , 2017, IEEE Journal of Selected Topics in Signal Processing.

[8]  Bayya Yegnanarayana,et al.  Combining evidence from residual phase and MFCC features for speaker recognition , 2006, IEEE Signal Processing Letters.

[9]  Longbiao Wang,et al.  Speaker Identification and Verification by Combining MFCC and Phase Information , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Douglas A. Reynolds,et al.  Modeling of the glottal flow derivative waveform with application to speaker identification , 1999, IEEE Trans. Speech Audio Process..

[11]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[12]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[13]  Rafal Samborski,et al.  Playback attack detection for text-dependent speaker verification over telephone channels , 2015, Speech Commun..

[14]  Jon Sánchez,et al.  Synthetic speech detection using phase information , 2016, Speech Commun..

[15]  M.R. Bai,et al.  Optimization of Microspeaker Diaphragm Pattern Using Combined Finite Element–Lumped Parameter Models , 2008, IEEE Transactions on Magnetics.

[16]  Bayya Yegnanarayana,et al.  Significance of group delay functions in spectrum estimation , 1992, IEEE Trans. Signal Process..

[17]  Haizhou Li,et al.  Spoofing and countermeasures for speaker verification: A survey , 2015, Speech Commun..

[18]  S. R. Mahadeva Prasanna,et al.  Epoch Extraction Using Zero Band Filtering from Speech Signal , 2015, Circuits Syst. Signal Process..

[19]  Amel Benazza-Benyahia,et al.  Efficient transform-based texture image retrieval techniques under quantization effects , 2016, Multimedia Tools and Applications.

[20]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[21]  Chi-Wei Chiu,et al.  Magneto-Electrodynamical Modeling and Design of a Microspeaker Used for Mobile Phones With Considerations of Diaphragm Corrugation and Air Closures , 2007, IEEE Transactions on Magnetics.

[22]  Bayya Yegnanarayana,et al.  Event-Based Instantaneous Fundamental Frequency Estimation From Speech Signals , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  I. Saratxaga,et al.  Simple representation of signal phase for harmonic speech models , 2009 .

[24]  Hemant A. Patil,et al.  Novel Unsupervised Auditory Filterbank Learning Using Convolutional RBM for Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[25]  Judith C. Brown Calculation of a constant Q spectral transform , 1991 .

[26]  Bayya Yegnanarayana,et al.  Epoch Extraction From Speech Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.