Blind detection of electronic disguised voice

Since voice disguise has great negative impact on establishing authenticity of audio evidence in forensics, and has shown an increasing tendency in illegal applications, it is important to identify whether a suspected voice has been disguised or not. However, research on such detection has not been reported. In this paper, we focus on blind detection of electronic disguised voice. Statistical moments of Mel-frequency cepstrum coefficients (MFCC) are extracted as acoustic features of speech signals. Then an approach for detection of disguised voice based on the extracted features and Support Vector Machine (SVM) classifiers is proposed. The extensive experiments demonstrate that detection rates higher than 95% can be achieved, indicating that detection performance of the proposed approach is good.

[1]  Drew H. Abney,et al.  Journal of Experimental Psychology : Human Perception and Performance Influence of Musical Groove on Postural Sway , 2015 .

[2]  B. Morrongiello,et al.  Development of the perception of musical relations: semitone and diatonic structure. , 1986, Journal of experimental psychology. Human perception and performance.

[3]  Jean Laroche Time and Pitch Scale Modification of Audio Signals , 2002 .

[4]  Tanja Schultz,et al.  Voice convergin: Speaker de-identification by voice transformation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Javier Ortega-Garcia,et al.  Effect of voice disguise on the performance of a forensic automatic speaker recognition system , 2004, Odyssey.

[6]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[7]  Oliver Chiu-sing Choy,et al.  An efficient MFCC extraction method in speech recognition , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[8]  R. Rodman SPEAKER RECOGNITION OF DISGUISED VOICES : A PROGRAM FOR RESEARCH , 2000 .

[9]  Cuiling Zhang,et al.  Voice disguise and automatic speaker recognition. , 2008, Forensic science international.

[10]  Lonce L. Wyse,et al.  Real-Time Signal Estimation From Modified Short-Time Fourier Transform Magnitude Spectra , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[12]  Tiejun Tan,et al.  The effect of voice disguise on Automatic Speaker Recognition , 2010, 2010 3rd International Congress on Image and Signal Processing.

[13]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..