Automatic Talker Identification Using Optimal Spectral Resolution: Application in noisy environment and telephony

This chapter deals with the problem of speaker characterization, for which the principal interest is the improvement of the techniques of talker identification. For this purpose, the authors investigate the effect of spectral resolution in the speaker identification performance. This investigation employs an approach based on the second order statistical measures using the Mel Frequency Spectral Coefficients (MFSC) and looks for the best spectral resolution (optimal number of MFSC). In fact, researchers do prefer using low spectral resolutions for many justifiable reasons, but we do not know what is the best resolution to adopt, especially in talker identification and we do not know what are the performances got with high spectral resolutions either. To find that optimal resolution, in microphonic and telephonic bandwidth,the authors have experimented several dimensions for the MFSC coefficients and several types of additive noises, at several SNR ratios. Results show the importance of the high spectral resolution in noisy environment and telephonic bandwidth, while the current research works have always favoured the low resolution of 24 coefficients in

[1]  Xin Luo,et al.  Encyclopedia of Multimedia Technology and Networking , 2008 .

[2]  Sara H. Basson,et al.  NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[3]  Mahito Fujii,et al.  Video Face Tracking and Recognition with Skin Region Extraction and Deformable Template Matching , 2012, Int. J. Multim. Data Eng. Manag..

[4]  Joachim Wilke,et al.  A further investigation on AR-vector models for text-independent speaker identification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  Liam Paninski,et al.  The Spike-Triggered Average of the Integrate-and-Fire Cell Driven by Gaussian White Noise , 2006, Neural Computation.

[6]  Mounya Elhilali,et al.  Information-bearing components of speech intelligibility under babble-noise and bandlimiting distortions , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Chao Chen,et al.  A Web-Based Multimedia Retrieval System with MCA-Based Filtering and Subspace-Based Learning Algorithms , 2013, Int. J. Multim. Data Eng. Manag..

[8]  Nayef Fawaz Mendahawi,et al.  The 3rd generation partnership project packet-switched streaming (3GPP-PSS): fundamentals and applications , 2011 .

[9]  D. O'Shaughnessy,et al.  Speaker recognition , 1986, IEEE ASSP Magazine.

[10]  Tomi Kinnunen,et al.  Spectral Features for Automatic Text-Independent Speaker Recognition , 2003 .

[11]  A. Enis Çetin,et al.  The Teager energy based feature parameters for robust speech recognition in car noise , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[12]  Yi Hu,et al.  A Comparative Intelligibility Study of Speech Enhancement Algorithms , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[13]  Peter Vary,et al.  Digital Speech Transmission: Enhancement, Coding and Error Concealment , 2006 .

[14]  W. Fisher,et al.  An acoustic‐phonetic data base , 1987 .

[15]  Ce Zhu,et al.  Streaming Media Architectures, Techniques, and Applications: Recent Advances , 2010 .

[16]  Bong-Jin Lee,et al.  On the Use of Voting Methods for Speaker Identification Based on Various Resolution Filterbanks , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[17]  Frédéric Bimbot,et al.  Effect of utterance duration and phonetic content on speaker identification using second-order statistical methods , 1995, EUROSPEECH.

[18]  Halim Sayoud,et al.  Optimal Spectral Resolution in Speaker Authentication Application in Noisy Environment and Telephony , 2009, Int. J. Mob. Comput. Multim. Commun..

[19]  Gerardo Ayala,et al.  Content Adaptation in Mobile Learning Environments , 2010, Int. J. Multim. Data Eng. Manag..

[20]  Richard M. Stern,et al.  Band-Independent Mask Estimation for Missing-Feature Reconstruction in the Presence of Unknown Background Noise , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[21]  Mahbubur Rahman Syed Multimedia technologies : concepts, methodologies, tools, and applications / Syed Mahbubur Rahman [editor]. , 2008 .

[22]  Ismail Khalil,et al.  Innovations in Mobile Multimedia Communications and Applications: New Technologies , 2011 .

[23]  Ivan Magrin-Chagnolleau,et al.  Second-order statistical measures for text-independent speaker identification , 1995, Speech Commun..

[24]  Maode Ma Architectures of the Interworking of 3G Cellular Networks and Wireless LANs , 2009 .

[25]  R. Mühler,et al.  Development of a Speaker Discrimination Test for Cochlear Implant Users Based on the Oldenburg Logatome Corpus , 2008, ORL.