论文信息 - Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection

Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection

Speech Synthesis (SS) and Voice Conversion (VC) presents a genuine risk of attacks for Automatic Speaker Verification (ASV) technology. In this paper, we use our recently proposed unsupervised filterbank learning technique using Convolutional Restricted Boltzmann Machine (ConvRBM) as a frontend feature representation. ConvRBM is trained on training subset of ASV spoof 2015 challenge database. Analyzing the filterbank trained on this dataset shows that ConvRBM learned more low-frequency subband filters compared to training on natural speech database such as TIMIT. The spoofing detection experiments were performed using Gaussian Mixture Models (GMM) as a back-end classifier. ConvRBM-based cepstral coefficients (ConvRBM-CC) perform better than hand crafted Mel Frequency Cepstral Coefficients (MFCC). On the evaluation set, ConvRBM-CC features give an absolute reduction of 4.76 % in Equal Error Rate (EER) compared to MFCC features. Specifically, ConvRBM-CC features significantly perform better in both known attacks (1.93 %) and unknown attacks (5.87 %) compared to MFCC features.

Madhu R. Kamble | Hemant A. Patil | Hardik B. Sailor

[1] Yannis Stylianou,et al. Voice Transformation: A survey , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2] Nicholas W. D. Evans,et al. Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification , 2017, Comput. Speech Lang..

[3] Hemant A. Patil,et al. Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech , 2015, INTERSPEECH.

[4] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[5] Goutam Saha,et al. Overview of BTAS 2016 speaker anti-spoofing competition , 2016, 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[6] Themos Stafylakis,et al. Spoofing Detection on the ASVspoof2015 Challenge Corpus Employing Deep Neural Networks , 2016, Odyssey.

[7] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[8] Hemant A. Patil,et al. Filterbank learning using Convolutional Restricted Boltzmann Machine for speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9] M. Wagner,et al. Vulnerability of speaker verification to voice mimicking , 2004, Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004..

[10] Aleksandr Sizov,et al. ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge , 2015, INTERSPEECH.

[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12] John H. L. Hansen,et al. An Investigation of Deep-Learning Frameworks for Speaker Verification Antispoofing , 2017, IEEE Journal of Selected Topics in Signal Processing.

[13] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[14] Kai Yu,et al. End-to-end spoofing detection with raw waveform CLDNNS , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16] Richard M. Stern,et al. Hearing Is Believing: Biologically Inspired Methods for Robust Automatic Speech Recognition , 2012, IEEE Signal Processing Magazine.

[17] Aleksandr Sizov,et al. Classifiers for synthetic speech detection: a comparison , 2015, INTERSPEECH.

[18] Nicholas W. D. Evans,et al. A New Feature for Automatic Speaker Verification Anti-Spoofing: Constant Q Cepstral Coefficients , 2016, Odyssey.

[19] Hemant A. Patil,et al. Cochlear Filter and Instantaneous Frequency Based Features for Spoofed Speech Detection , 2017, IEEE Journal of Selected Topics in Signal Processing.

[20] Nicholas W. D. Evans,et al. Re-assessing the threat of replay spoofing attacks against automatic speaker verification , 2014, 2014 International Conference of the Biometrics Special Interest Group (BIOSIG).

[21] Tomi Kinnunen,et al. A comparison of features for synthetic speech detection , 2015, INTERSPEECH.

[22] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Bo Chen,et al. Robust deep feature for spoofing detection - the SJTU system for ASVspoof 2015 challenge , 2015, INTERSPEECH.

[24] Richard M. Stern,et al. Features Based on Auditory Physiology and Perception , 2012, Techniques for Noise Robustness in Automatic Speech Recognition.

[25] Hemant A. Patil,et al. Novel Unsupervised Auditory Filterbank Learning Using Convolutional RBM for Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[26] Hemant A. Patil,et al. Unsupervised Deep Auditory Model Using Stack of Convolutional RBMs for Speech Recognition , 2016, INTERSPEECH.

[27] Kai Yu,et al. Deep features for automatic spoofing detection , 2016, Speech Communication.

[28] Haizhou Li,et al. Spoofing and countermeasures for speaker verification: A survey , 2015, Speech Commun..

[29] Aleksandr Sizov,et al. ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge , 2017, IEEE Journal of Selected Topics in Signal Processing.

[30] Hardik B Sailor,et al. Auditory feature representation using convolutional restricted Boltzmann machine and Teager energy operator for speech recognition. , 2017, The Journal of the Acoustical Society of America.