Mitigating effects of noise in Forensic Speaker Recognition

Among the different biometrics, speech has a significant role in forensics, since it can become a strong evidence in most of the crimes happening today. In Forensic Speaker Recognition (FSR), this speech evidence is used to recognize its speaker. This speech may be unpredictably distorted compared to the speech samples available for other speaker recognition applications. Some of the major distortions affecting the forensic speech quality are noise, speech coding, channel effects, multiple speakers, voice disguise, voice forgery, duration of the speech etc. A forensic speaker verification system which is highly robust to noise is designed and implemented in this paper. Robustness is achieved by employing an iterative threshold based Voice Activity Detector (VAD) in the preprocessing stage and Gammatone Frequency Cepstral Coefficients (GFCC) as the feature set. Gaussian Mixture Model (GMM) which are adapted from a Universal Background Model (UBM) is employed for classification. Performance of the proposed FSR is evaluated under different SNR conditions using equal error rate (EER). Verification accuracy of the proposed system is analysed and compared using Linear Predictive Coding (LPC) coefficients and Mel Frequency Cepstral Coefficients (MFCC) also.

[1]  Lior Wolf,et al.  I know that voice: Identifying the voice actor behind the voice , 2015, 2015 International Conference on Biometrics (ICB).

[2]  Jose B. Trangol Curipe,et al.  Feature Extraction Using LPC-Residual and MelFrequency Cepstral Coefficients in Forensic Speaker Recognition , 2013 .

[3]  R. P. Ramachandran,et al.  Robust speaker recognition: a feature-based approach , 1996, IEEE Signal Processing Magazine.

[4]  Joaquín González-Rodríguez Forensic automatic speaker recognition: fiction or science? , 2008, INTERSPEECH.

[5]  Florian Denk,et al.  Enhanced forensic multiple speaker recognition in the presence of coloured noise , 2014, 2014 8th International Conference on Signal Processing and Communication Systems (ICSPCS).

[6]  Andrzej Drygajlo,et al.  A joint factor analysis model for handling mismatched recording conditions in forensic automatic speaker recognition , 2012, 2012 5th IAPR International Conference on Biometrics (ICB).

[7]  Aaron D. Lawson,et al.  Survey and evaluation of acoustic features for speaker recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Driss Matrouf,et al.  Additive noise compensation in the i-vector space for speaker recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Daniel Garcia-Romero,et al.  Linear versus mel frequency cepstral coefficients for speaker recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[10]  Geoffrey Stewart Morrison,et al.  Mismatched distances from speakers to telephone in a forensic-voice-comparison case , 2015, Speech Commun..

[11]  Geoffrey Stewart Morrison,et al.  Effects of telephone transmission on the performance of formant-trajectory-based forensic voice comparison - Female voices , 2013, Speech Commun..

[12]  John H. L. Hansen,et al.  Assessment of single-channel speech enhancement techniques for speaker identification under mismatched conditions , 2010, INTERSPEECH.

[13]  David A. van Leeuwen,et al.  The effect of noise on modern automatic speaker recognition systems , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Shrikanth S. Narayanan,et al.  Robust Voice Activity Detection Using Long-Term Signal Variability , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  D.A. Reynolds,et al.  Large population speaker identification using clean and telephone speech , 1995, IEEE Signal Processing Letters.

[16]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[17]  Douglas E. Sturim,et al.  Classification Methods for Speaker Recognition , 2007, Speaker Classification.

[18]  R. Rodman,et al.  Computer Recognition of Speakers Who Disguise Their Voice , 2000 .

[19]  Haizhou Li,et al.  Low-Variance Multitaper MFCC Features: A Case Study in Robust Speaker Verification , 2012, IEEE Transactions on Audio, Speech, and Language Processing.