Front-End for Antispoofing Countermeasures in Speaker Verification: Scattering Spectral Decomposition

As speaker verification is widely used as a means of verifying personal identity in commercial applications, the study of antispoofing countermeasures has become increasingly important. By choosing appropriate spectral and prosodic feature mapping, spoofing methods based on voice conversion and speech synthesis are both capable of deceiving speaker verification systems that typically rely on these features. Consequently alternative front-ends are required for effective spoofing detection. This paper investigates the use of the recently proposed hierarchical scattering decomposition technique, which can be viewed as a generalization of all constant-Q spectral decompositions, to implement front-ends for stand-alone spoofing detection. The coefficients obtained using this decomposition are converted to a feature vector of Scattering Cepstral Coefficients (SCCs). We evaluate the performance of SCCs on the recent spoofing and Antispoofing (SAS) corpus as well as the ASVspoof 2015 challenge corpus and show that SCCs are superior to all other front-ends that have previously been benchmarked on the ASVspoof corpus.

[1]  Heiga Zen,et al.  Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Themos Stafylakis,et al.  Spoofing Detection on the ASVspoof2015 Challenge Corpus Employing Deep Neural Networks , 2016, Odyssey.

[3]  Yun Lei,et al.  A novel scheme for speaker recognition using a phonetically-aware deep neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Florin Curelaru,et al.  Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).

[5]  Shi-Huang Chen,et al.  Speaker Verification Using MFCC and Support Vector Machine , 2022 .

[6]  Justin Fackrell,et al.  Segment selection in the L&h Realspeak laboratory TTS system , 2000, INTERSPEECH.

[7]  Aleksandr Sizov,et al.  Introducing i-vectors for joint anti-spoofing and speaker verification , 2014, INTERSPEECH.

[8]  Nitesh Saxena,et al.  All Your Voices are Belong to Us: Stealing Voices to Fool Humans and Machines , 2015, ESORICS.

[9]  Tomi Kinnunen,et al.  A comparison of features for synthetic speech detection , 2015, INTERSPEECH.

[10]  Yi Liu,et al.  Simultaneous utilization of spectral magnitude and phase information to extract supervectors for speaker verification anti-spoofing , 2015, INTERSPEECH.

[11]  Daniel Erro,et al.  Voice Conversion Based on Weighted Frequency Warping , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Vidhyasaharan Sethu,et al.  Investigation of Sub-Band Discriminative Information Between Spoofed and Genuine Speech , 2016, INTERSPEECH.

[13]  Hemant A. Patil,et al.  Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech , 2015, INTERSPEECH.

[14]  Heiga Zen,et al.  The HMM-based speech synthesis system (HTS) version 2.0 , 2007, SSW.

[15]  Haizhou Li,et al.  Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge , 2015, INTERSPEECH.

[16]  M. Wagner,et al.  Vulnerability of speaker verification to voice mimicking , 2004, Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004..

[17]  Vidhyasaharan Sethu,et al.  A model based voice activity detector for noisy environments , 2015, INTERSPEECH.

[18]  Douglas D. O'Shaughnessy,et al.  Multitaper MFCC and PLP features for speaker verification using i-vectors , 2013, Speech Commun..

[19]  Galina Lavrentyeva,et al.  STC anti-spoofing systems for the ASVspoof 2015 challenge , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Zhizheng Wu,et al.  Deep neural network-guided unit selection synthesis , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Chng Eng Siong,et al.  Vulnerability of speaker verification systems against voice conversion spoofing attacks: The case of telephone speech , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Tomoki Toda,et al.  SAS: A speaker verification spoofing database containing diverse attacks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Nicholas W. D. Evans,et al.  A New Feature for Automatic Speaker Verification Anti-Spoofing: Constant Q Cepstral Coefficients , 2016, Odyssey.

[24]  Douglas A. Reynolds,et al.  A unified deep neural network for speaker and language recognition , 2015, INTERSPEECH.

[25]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[26]  Stéphane Mallat,et al.  Group Invariant Scattering , 2011, ArXiv.

[27]  Zhizheng Wu,et al.  A study of speaker adaptation for DNN-based speech synthesis , 2015, INTERSPEECH.

[28]  Paul Taylor,et al.  Text-to-Speech Synthesis , 2009 .

[29]  Moncef Gabbouj,et al.  Voice Conversion Using Dynamic Kernel Partial Least Squares Regression , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Yannis Stylianou,et al.  Voice Transformation: A survey , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31]  Haizhou Li,et al.  Spoofing and countermeasures for speaker verification: A survey , 2015, Speech Commun..

[32]  Hyung Soon Kim,et al.  Narrowband to wideband conversion of speech using GMM based transformation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[33]  Tomoki Toda,et al.  Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[34]  Mats Blomberg,et al.  Vulnerability in speaker verification - a study of technical impostor techniques , 1999, EUROSPEECH.

[35]  Li-Rong Dai,et al.  Speaker verification against synthetic speech , 2010, 2010 7th International Symposium on Chinese Spoken Language Processing.

[36]  Richard C. Rose,et al.  Deep bottleneck features for i-vector based text-independent speaker verification , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[37]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[38]  Marc Schröder,et al.  The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching , 2003, Int. J. Speech Technol..

[39]  Bo Chen,et al.  Robust deep feature for spoofing detection - the SJTU system for ASVspoof 2015 challenge , 2015, INTERSPEECH.

[40]  Artur Janicki Spoofing countermeasure based on analysis of linear prediction error , 2015, INTERSPEECH.

[41]  Tomoki Toda,et al.  Non-parallel training for many-to-many eigenvoice conversion , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[42]  Qi Li,et al.  An Auditory-Based Feature Extraction Algorithm for Robust Speaker Identification Under Mismatched Conditions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[43]  K. Shikano,et al.  Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[44]  Keikichi Hirose,et al.  One-to-Many Voice Conversion Based on Tensor Representation of Speaker Space , 2011, INTERSPEECH.

[45]  Bin Ma,et al.  Evaluation of a fused FM and cepstral-based speaker recognition system on the NIST 2008 SRE , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[46]  Tomoki Toda,et al.  Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[47]  Themos Stafylakis,et al.  Development of CRIM system for the automatic speaker verification spoofing and countermeasures challenge 2015 , 2015, INTERSPEECH.

[48]  Federico Alegre,et al.  Anti-spoofing: Voice conversion , 2014 .

[49]  Heiga Zen,et al.  Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.

[50]  Satoshi Nakamura,et al.  Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[51]  Aleksandr Sizov,et al.  ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge , 2015, INTERSPEECH.

[52]  Keikichi Hirose,et al.  Effects of Speaker Adaptive Training on Tensor-based Arbitrary Speaker Conversion , 2012, INTERSPEECH.

[53]  Ibon Saratxaga,et al.  Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[54]  Haizhou Li,et al.  A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[55]  Tomi Kinnunen,et al.  Speaker Recognition Anti-spoofing , 2014, Handbook of Biometric Anti-Spoofing.

[56]  Joakim Andén,et al.  Deep Scattering Spectrum , 2013, IEEE Transactions on Signal Processing.