An Audio Fingerprinting Approach to Replay Attack Detection on ASVSPOOF 2017 Challenge Data

Replay attacks, where an impostor replays a genuine user utterance, are a major vulnerability of speaker verification systems. Two highly likely scenarios for replay attacks are either hidden recording of actual spoken access trials, or reusing previous genuine recordings in case of fraudulent access to transmission channels or storage devices. In both scenarios, an audio fingerprint-based approach comparing any access trial with all previous recordings from the claimed speaker perfectly fits the task of replay attack detection. However, ASVspoof 2017 rules did not allow the use of the original RedDots audio files (spoofed trials are replayed versions of RedDots), which disabled a fingerprint-based regular participation in the evaluation as those original files are necessary to build the bank of previous-access audio fingerprints. Then, we agreed with the organizers to run and submit on time a parallel fingerprintbased evaluation with exactly the same blind test data with an alternative but realistic (deployable) evaluation scenario. While we obtained an Equal Error Rate of 8.91% detecting replayed versus genuine trials, this result is not comparable for ranking purposes with those from actual participants in the Challenge as we used the original RedDots files. However, it provides insight into the potential and complementarity of audio fingerprinting, especially for high audio-quality attacks where state-of-the-art acoustic antispoofing systems show poor performance (the best ASVspoof 2017 system with global EER of 6.73% degraded to about 25% in condition C6 of high-quality replays), while our fingerprint-based antispoofer obtains an EER of 0.0% for the high-quality replays in condition C6, showing the complementarity of acoustic antispoofers for low-mid quality replays and fingerprint-based ones for mid-high quality replays.

[1]  Nicholas W. D. Evans,et al.  Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification , 2017, Comput. Speech Lang..

[2]  Bin Ma,et al.  The reddots data collection for speaker recognition , 2015, INTERSPEECH.

[3]  Junichi Yamagishi,et al.  ASVspoof 2021: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan , 2021, ArXiv.

[4]  Bob L. Sturm The State of the Art Ten Years After a State of the Art: Future Research in Music Information Retrieval , 2013, ArXiv.

[5]  Kong-Aik Lee,et al.  The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection , 2017, INTERSPEECH.

[6]  Avery Wang,et al.  The Shazam music recognition service , 2006, CACM.

[7]  Rabab Kreidieh Ward,et al.  A local fingerprinting approach for audio copy detection , 2014, Signal Process..

[8]  Aleksandr Sizov,et al.  ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge , 2017, IEEE Journal of Selected Topics in Signal Processing.

[9]  Kong-Aik Lee,et al.  RedDots replayed: A new replay spoofing attack corpus for text-dependent speaker verification research , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Pedro Cano,et al.  A Review of Audio Fingerprinting , 2005, J. VLSI Signal Process..

[11]  Haizhou Li,et al.  Spoofing and countermeasures for speaker verification: A survey , 2015, Speech Commun..

[12]  Avery Wang,et al.  An Industrial Strength Audio Search Algorithm , 2003, ISMIR.