RawBoost: A Raw Data Boosting and Augmentation Method applied to Automatic Speaker Verification Anti-Spoofing

This paper introduces RawBoost, a data boosting and augmentation method for the design of more reliable spoofing detection solutions which operate directly upon raw waveform inputs. While RawBoost requires no additional data sources, e.g. noise recordings or impulse responses and is data, application and model agnostic, it is designed for telephony scenarios. Based upon the combination of linear and non-linear convolutive noise, impulsive signal-dependent additive noise and stationary signal-independent additive noise, RawBoost models nuisance variability stemming from, e.g., encoding, transmission, microphones and amplifiers, and both linear and non-linear distortion. Experiments performed using the ASVspoof 2021 logical access database show that RawBoost improves the performance of a state-of-the-art raw end-to-end baseline system by 27% relative and is only outperformed by solutions that either depend on external data or that require additional intervention at the model level.

[1]  Hemlata Tak,et al.  End-to-end anti-spoofing with RawNet2 , 2020 .

[2]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[3]  Zhiyao Duan,et al.  UR Channel-Robust Synthetic Speech Detection System for ASVspoof 2021 , 2021, 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge.

[4]  Ganesh Sivaraman,et al.  Generalization of Audio Deepfake Detection , 2020, Odyssey.

[5]  Alan V. Oppenheim,et al.  Discrete-time signal processing (2nd ed.) , 1999 .

[6]  Madhu R. Kamble,et al.  End-to-End Spectro-Temporal Graph Attention Networks for Speaker Verification Anti-Spoofing and Speech Deepfake Detection , 2021, 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Quoc V. Le,et al.  SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.

[9]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[10]  John H. L. Hansen,et al.  An Analysis of Transfer Learning for Domain Mismatched Text-independent Speaker Verification , 2018, Odyssey.

[11]  Joon Son Chung,et al.  Augmentation adversarial training for unsupervised speaker recognition , 2020, ArXiv.

[12]  Tomi Kinnunen,et al.  ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection , 2021, 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge.

[13]  E. Khoury,et al.  Pindrop Labs' Submission to the ASVspoof 2021 Challenge , 2021, 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge.

[14]  G. Lavrentyeva,et al.  STC Antispoofing Systems for the ASVspoof2021 Challenge , 2021, 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge.

[15]  Hanseok Ko,et al.  SpecMix : A Mixed Sample Data Augmentation method for Training withTime-Frequency Domain Features , 2021, Interspeech 2021.

[16]  Sanjeev Khudanpur,et al.  Audio augmentation for speech recognition , 2015, INTERSPEECH.

[17]  Douglas A. Reynolds,et al.  Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[18]  Shilei Zhang,et al.  The DKU-CMRI System for the ASVspoof 2021 Challenge: Vocoder based Replay Channel Response Estimation , 2021, 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge.

[19]  Soroush Vosoughi,et al.  Data Boost: Text Data Augmentation through Reinforcement Learning Guided Conditional Generation , 2020, EMNLP.

[20]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[21]  Chng Eng Siong,et al.  Audio Codec Simulation based Data Augmentation for Telephony Speech Recognition , 2019, 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[22]  Rui Yang Additive noise detection and its application to audio forensics , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[23]  Roberto Togneri,et al.  Replay anti-spoofing countermeasure based on data augmentation with post selection , 2020, Comput. Speech Lang..

[24]  Thomas Fang Zheng,et al.  Noisy training for deep neural networks in speech recognition , 2015, EURASIP Journal on Audio, Speech, and Music Processing.

[25]  Zhiyao Duan,et al.  An Empirical Study on Channel Effects for Synthetic Voice Spoofing Countermeasure Systems , 2021, Interspeech 2021.

[26]  Junichi Yamagishi,et al.  ASVspoof 2021: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan , 2021, ArXiv.

[27]  Navdeep Jaitly,et al.  Vocal Tract Length Perturbation (VTLP) improves speech recognition , 2013 .

[28]  Rohan Kumar Das Known-unknown Data Augmentation Strategies for Detection of Logical Access, Physical Access and Speech Deepfake Attacks: ASVspoof 2021 , 2021, 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge.

[29]  Sunil Kumar Kopparapu,et al.  Multi-Conditioning and Data Augmentation Using Generative Noise Model for Speech Emotion Recognition in Noisy Conditions , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Truong Q. Nguyen,et al.  Multirate filter banks and transform coding gain , 1998, IEEE Trans. Signal Process..

[31]  Massimiliano Todisco,et al.  Raw Differentiable Architecture Search for Speech Deepfake and Spoofing Detection , 2021, 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge.

[32]  A.Y. Kibangou,et al.  Wiener-Hammerstein systems modeling using diagonal Volterra kernels coefficients , 2006, IEEE Signal Processing Letters.

[33]  Woo Hyun Kang,et al.  CRIM's System Description for the ASVSpoof2021 Challenge , 2021, 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge.

[34]  Hye-jin Shim,et al.  Improved RawNet with Filter-wise Rescaling for Text-independent Speaker Verification using Raw Waveforms , 2020, ArXiv.

[35]  Rohan Kumar Das,et al.  Data Augmentation with Signal Companding for Detection of Logical Access Attacks , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  Xiaodong Cui,et al.  Data Augmentation for Deep Neural Network Acoustic Modeling , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[37]  Kai Yu,et al.  End-to-end spoofing detection with raw waveform CLDNNS , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  Matthijs Douze,et al.  Data Augmenting Contrastive Learning of Speech Representations in the Time Domain , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).

[39]  Sébastien Le Maguer,et al.  ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech , 2019, Comput. Speech Lang..

[40]  T. Grau,et al.  The Biometric Vox System for the ASVspoof 2021 Challenge , 2021, 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge.

[41]  Shugong Xu,et al.  RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform , 2021, Interspeech.