Hiding speaker characteristics for security

The recent advances in information technology have simplified voice data transmission but brings with it major security threats. Voice data transmission not only conveys confidential information but also convey information about the speaker. Most of the security domain complements secrecy about speaker identity. This paper presents a method that deals with the subject of hiding speaker identity. Alteration of acoustic features mainly pitch, and energy of speakers voice is necessary for making it difficult for adversary to infer about the speaker. Speaker normalization approach is used to accomplish this. An evaluation of subjective and objective tests confirms that the normalized signal allows speech waveform to sound intelligible without revealing information about the speaker. This technique can be used in medical, in military and in those domains in which maintaining secrecy about the speaker is a major security concern.

[1]  Hermann Ney,et al.  Histogram based normalization in the acoustic feature space , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[2]  C. Lopes,et al.  VTLN Through Frequency Warping Based on Pitch , 2003 .

[3]  Ming Liu,et al.  Frequency domain correspondence for speaker normalization , 2007, INTERSPEECH.

[4]  H. Wakita Normalization of vowels by vocal-tract length and its application to vowel identification , 1977 .

[6]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[7]  Md. Jahangir Alam,et al.  COMPARATIVE STUDY OF A PRIORI SIGNAL-TONOISE RATIO (SNR) ESTIMATION APPROACHES FOR SPEECH ENHANCEMENT , 2009 .

[8]  Urmila Shrawankar,et al.  Adverse Conditions and ASR Techniques for Robust Speech User Interface , 2013, ArXiv.

[9]  Philip C. Woodland,et al.  An investigation into vocal tract length normalisation , 1999, EUROSPEECH.

[10]  Thilo Pfau,et al.  A combination of speaker normalization and speech rate normalization for automatic speech recognition , 2000, INTERSPEECH.

[11]  Berlin Chen,et al.  A Comparative Study of Histogram Equalization (HEQ) for Robust Speech Recognition , 2007, ROCLING/IJCLCLP.

[12]  Thomas Fang Zheng,et al.  Pitch Mean Based Frequency Warping , 2006, ISCSLP.

[13]  Dragisa Miskovic,et al.  Energy Normalization in Automatic Speech Recognition , 2008, TSD.

[14]  Haizhou Li,et al.  Normalization of the Speech Modulation Spectra for Robust Speech Recognition , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Urmila Shrawankar,et al.  Voice Activity Detector and Noise Trackers for Speech Recognition System in Noisy Environment , 2010, Int. J. Adv. Comp. Techn..

[16]  Keiichiro Hoashi,et al.  Feature Analysis and Normalization Approach for Robust Content-Based Music Retrieval to Encoded Audio with Different Bit Rates , 2009, MMM.

[17]  Antonio J. Rubio,et al.  Feature extraction combining spectral noise reduction and cepstral histogram equalization for robust ASR , 2002, INTERSPEECH.

[18]  Keikichi Hirose,et al.  Adaptive thresholding approach for robust voiced/unvoiced classification , 2011, 2011 IEEE International Symposium of Circuits and Systems (ISCAS).

[19]  Urmila Shrawankar,et al.  Noise Estimation and Noise Removal Techniques for Speech Recognition in Adverse Environment , 2010, Intelligent Information Processing.

[21]  Mei-Yuh Hwang,et al.  Improvements on speech recognition for fast talkers , 1999, EUROSPEECH.