Detection of Replay-Spoofing Attacks Using Frequency Modulation Features

Prevention of malicious spoofing attacks is currently acknowledged as a priority area of investigation for the deployment of automatic speaker verification systems. Various features of speech signals have been used to fight counterfeit attacks. Among the different spoofing attack variants, replay attacks pose a significant threat as they do not require any expert knowledge and are difficult to detect. This paper proposes the use of a spectral centroid based frequency modulation (FM) features that we term spectral centroid deviation (SCD) for replay attack detection. Spectral centroid frequency (SCF) and spectral centroid magnitude coefficient (SCMC) features extracted from the same front-end as SCD are also investigated as complementary features. Evaluations on the ASVspoof 2017 dataset indicate that the proposed SCD features with a Gaussian Mixture Model (GMM) back-end is highly capable of discriminating genuine from replay spoofed speech, providing an equal error rate improvement greater than 60% relative to the CQCC baseline system from the ASVspoof 2017 challenge. Interestingly, experiments also reveal that the proposed SCD features exhibit an increased variance for replay spoofed speech relative to genuine speech, particularly for the lowest and highest frequency subbands.

[1]  Eliathamby Ambikairajah,et al.  FM features for automatic forensic speaker recognition , 2008, INTERSPEECH.

[2]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[3]  María José Cano,et al.  Experimental Analysis of Features for Replay Attack Detection - Results on the ASVspoof 2017 Challenge , 2017, INTERSPEECH.

[4]  Eliathamby Ambikairajah,et al.  Warped Magnitude and Phase-Based Features for Language Identification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  Galina Lavrentyeva,et al.  Audio Replay Attack Detection with Deep Learning Frameworks , 2017, INTERSPEECH.

[6]  Petros Maragos,et al.  Robust AM-FM features for speech recognition , 2005, IEEE Signal Processing Letters.

[7]  Jakub Galka,et al.  Audio Replay Attack Detection Using High-Frequency Features , 2017, INTERSPEECH.

[8]  Longbiao Wang,et al.  Relative phase information for detecting human speech and spoofed speech , 2015, INTERSPEECH.

[9]  Petros Maragos,et al.  Energy separation in signal modulations with application to speech analysis , 1993, IEEE Trans. Signal Process..

[10]  Kong-Aik Lee,et al.  The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection , 2017, INTERSPEECH.

[11]  Eliathamby Ambikairajah,et al.  Investigation of Spectral Centroid Magnitude and Frequency for Speaker Recognition , 2010, Odyssey.

[12]  Longbiao Wang,et al.  Spoofing Speech Detection Using Modified Relative Phase Information , 2017, IEEE Journal of Selected Topics in Signal Processing.

[13]  Vidhyasaharan Sethu,et al.  Investigation of spectral centroid features for cognitive load classification , 2011, Speech Commun..

[14]  Kuldip K. Paliwal,et al.  Spectral subband centroid features for speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[15]  Dan Wu,et al.  Ensemble Learning for Countermeasure of Audio Replay Spoofing Attack in ASVspoof2017 , 2017, INTERSPEECH.

[16]  Junichi Yamagishi,et al.  ASVspoof 2021: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan , 2021, ArXiv.

[17]  Bin Ma,et al.  The reddots data collection for speaker recognition , 2015, INTERSPEECH.

[18]  Suryakanth V. Gangashetty,et al.  SFF Anti-Spoofer: IIIT-H Submission for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2017 , 2017, INTERSPEECH.

[19]  Vidhyasaharan Sethu,et al.  Group delay features for emotion detection , 2007, INTERSPEECH.

[20]  S. R. Mahadeva Prasanna,et al.  Spoof Detection Using Source, Instantaneous Frequency and Cepstral Features , 2017, INTERSPEECH.

[21]  Madhu R. Kamble,et al.  Novel Variable Length Teager Energy Separation Based Instantaneous Frequency Features for Replay Detection , 2017, INTERSPEECH.