Instantaneous Phase and Excitation Source Features for Detection of Replay Attacks

In present era, the spoof detection has become an integral part of biometric systems and speaker verification is no exception to it. The replay attacks are the most common, where the attacker plays the recorded speech of a user to validate a false identity claim. Currently, constant-Q cepstral coefficient (CQCC) feature based system represents the standalone benchmark for spoof detection. However, we hypothesize that the phase and excitation source information of speech may carry additional artifacts that are useful for identifying the replay attacks. In this regard, instantaneous frequency cosine coefficients and two source features namely, discrete cosine transform of integrated linear prediction residual and residual mel frequency cepstral coefficient are explored. The studies are conducted on ASVspoof 2017 Version 2.0 database designed for the replay attacks. The results reveal that the phase and source features although perform poorer than CQCC, their fusion helps to achieve an improved performance. This indicates the complementary nature of information carried by the stated features is useful for detecting replay attacks. Further, an analysis on the behavior of each of these features under different replay configurations is also presented to highlight their effect in different scenarios.

[1]  S. R. Mahadeva Prasanna,et al.  Development of Multi-Level Speech based Person Authentication System , 2017, J. Signal Process. Syst..

[2]  Jakub Galka,et al.  Audio Replay Attack Detection Using High-Frequency Features , 2017, INTERSPEECH.

[3]  María José Cano,et al.  Experimental Analysis of Features for Replay Attack Detection - Results on the ASVspoof 2017 Challenge , 2017, INTERSPEECH.

[4]  Rohan Kumar Das,et al.  Exploring different attributes of source information for speaker verification with limited test data. , 2016, The Journal of the Acoustical Society of America.

[5]  Ibon Saratxaga,et al.  Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  A G Ramakrishnan,et al.  Voice source characterization using pitch synchronous discrete cosine transform for speaker identification. , 2015, The Journal of the Acoustical Society of America.

[7]  Heinz Hügli,et al.  Usefulness of the LPC-residue in text-independent speaker verification , 1995, Speech Commun..

[8]  Rohan Kumar Das,et al.  Countermeasure to handle replay attacks in practical speaker verification systems , 2016, 2016 International Conference on Signal Processing and Communications (SPCOM).

[9]  Jon Sánchez,et al.  Synthetic speech detection using phase information , 2016, Speech Commun..

[10]  A. G. Ramakrishnan,et al.  Epoch Extraction Based on Integrated Linear Prediction Residual Using Plosion Index , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Rafal Samborski,et al.  Playback attack detection for text-dependent speaker verification over telephone channels , 2015, Speech Commun..

[12]  Gang Wei,et al.  Channel pattern noise based playback attack detection algorithm for speaker recognition , 2011, 2011 International Conference on Machine Learning and Cybernetics.

[13]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[14]  S. R. Mahadeva Prasanna,et al.  Speech biometric based attendance system , 2014, 2014 Twentieth National Conference on Communications (NCC).

[15]  Tomi Kinnunen,et al.  Spoofing and countermeasures for automatic speaker verification , 2013, INTERSPEECH.

[16]  Haizhou Li,et al.  On the Importance of Analytic Phase of Speech Signals in Spoken Language Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Haizhou Li,et al.  On the study of replay and voice conversion attacks to text-dependent speaker verification , 2016, Multimedia Tools and Applications.

[18]  Kong-Aik Lee,et al.  The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection , 2017, INTERSPEECH.

[19]  Sébastien Marcel,et al.  Impact of Score Fusion on Voice Biometrics and Presentation Attack Detection in Cross-Database Evaluations , 2017, IEEE Journal of Selected Topics in Signal Processing.

[20]  Bin Ma,et al.  Joint Application of Speech and Speaker Recognition for Automation and Security in Smart Home , 2011, INTERSPEECH.

[21]  Florin Curelaru,et al.  Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).

[22]  Haizhou Li,et al.  Spoofing and countermeasures for speaker verification: A survey , 2015, Speech Commun..

[23]  S. R. Mahadeva Prasanna,et al.  Combining source and system information for limited data speaker verification , 2014, INTERSPEECH.

[24]  Nicholas W. D. Evans,et al.  Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification , 2017, Comput. Speech Lang..

[25]  Kong-Aik Lee,et al.  ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements , 2018, Odyssey.

[26]  Kong-Aik Lee,et al.  RedDots replayed: A new replay spoofing attack corpus for text-dependent speaker verification research , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Haizhou Li,et al.  A study on replay attack and anti-spoofing for text-dependent speaker verification , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[28]  Eduardo Lleida,et al.  Detecting Replay Attacks from Far-Field Recordings on Speaker Verification Systems , 2011, BIOID.

[29]  Aleksandr Sizov,et al.  ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge , 2015, INTERSPEECH.

[30]  S. R. Mahadeva Prasanna,et al.  Different aspects of source information for limited data speaker verification , 2015, 2015 Twenty First National Conference on Communications (NCC).

[31]  Karthika Vijayan,et al.  Significance of analytic phase of speech signals in speaker verification , 2016, Speech Commun..

[32]  Goutam Saha,et al.  Overview of BTAS 2016 speaker anti-spoofing competition , 2016, 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[33]  S. R. Mahadeva Prasanna,et al.  Spoof Detection Using Source, Instantaneous Frequency and Cepstral Features , 2017, INTERSPEECH.

[34]  Madhu R. Kamble,et al.  Novel Variable Length Teager Energy Separation Based Instantaneous Frequency Features for Replay Detection , 2017, INTERSPEECH.

[35]  Tomi Kinnunen,et al.  Robust Voice Liveness Detection and Speaker Verification Using Throat Microphones , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[36]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[37]  S. Marple Computing the discrete-time 'analytic' signal via FFT , 1997 .

[38]  Vidhyasaharan Sethu,et al.  Investigating the use of scattering coefficients for replay attack detection , 2017, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[39]  Debadatta Pati,et al.  Speaker information from subband energies of Linear Prediction residual , 2010, 2010 National Conference On Communications (NCC).

[40]  Aleksandr Sizov,et al.  ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge , 2017, IEEE Journal of Selected Topics in Signal Processing.

[41]  Eduardo Lleida,et al.  Preventing replay attacks on speaker verification systems , 2011, 2011 Carnahan Conference on Security Technology.

[42]  Nicholas W. D. Evans,et al.  A New Feature for Automatic Speaker Verification Anti-Spoofing: Constant Q Cepstral Coefficients , 2016, Odyssey.