Noise effect on Amazigh digits in speech recognition system

Automatic Speech Recognition (ASR) for Amazigh speech, particularly Moroccan Tarifit accented speech, is a less researched area. This paper focuses on the analysis and evaluation of the first ten Amazigh digits in the noisy conditions from an ASR perspective based on Signal to Noise Ratio (SNR). Our testing experiments were performed under two types of noise and repeated with added environmental noise with various SNR ratios for each kind ranging from 5 to 45 dB. Different formalisms are used to develop a speaker independent Amazigh speech recognition, like Hidden Markov Model (HMMs), Gaussian Mixture Models (GMMs). The experimental results under noisy conditions show that degradation of performance was observed for all digits with different degrees and the rates under car noisy environment are decreased less than grinder conditions with the difference of 2.84% and 8.42% at SNR 5 dB and 25 dB, respectively. Also, we observed that the most affected digits are those which contain the "S" alphabet.

[1]  Mark J. F. Gales,et al.  Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..

[2]  Khalid Satori,et al.  Vocal parameters analysis of smoker using Amazigh language , 2018, Int. J. Speech Technol..

[3]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[4]  Yifan Gong,et al.  High-performance hmm adaptation with joint compensation of additive and convolutive distortions via Vector Taylor Series , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[5]  Ho-Young Jung,et al.  A Commercial Car Navigation System using Korean Large Vocabulary Automatic Speech Recognizer , 2009 .

[6]  Yu Hu,et al.  An HMM Compensation Approach Using Unscented Transformation for Noisy Speech Recognition , 2006, ISCSLP.

[7]  Khalid Satori,et al.  Voice comparison between smokers and non-smokers using HMM speech recognition system , 2017, Int. J. Speech Technol..

[8]  Alex Acero,et al.  Acoustic model adaptation via Linear Spline Interpolation for robust speech recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Katherine E. Hoffman,et al.  Berber language ideologies, maintenance, and contraction: Gendered variation in the indigenous margins of Morocco , 2006 .

[10]  Alex Acero,et al.  Noise Adaptive Training for Robust Automatic Speech Recognition , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Khalid Satori,et al.  Amazigh digits through interactive speech recognition system in noisy environment , 2020, Int. J. Speech Technol..

[12]  John H. L. Hansen,et al.  Robust speech recognition in noise: an evaluation using the SPINE corpus , 2001, INTERSPEECH.

[13]  Tanja Schultz,et al.  Automatic speech recognition for under-resourced languages: A survey , 2014, Speech Commun..

[14]  R.M. Stern,et al.  Missing-feature approaches in speech recognition , 2005, IEEE Signal Processing Magazine.

[15]  Khalid Satori,et al.  Speech Coding Effect on Amazigh Alphabet Speech Recognition Performance , 2019 .

[16]  Kuldeep Kumar,et al.  A Hindi speech recognition system for connected words using HTK , 2012 .

[17]  Artur Janicki,et al.  Voice-Driven Computer Game in Noisy Environments , 2013, Int. J. Comput. Sci. Appl..

[18]  Pravin Yannawar,et al.  A Review on Speech Recognition Technique , 2010 .

[19]  Richard M. Stern,et al.  A vector Taylor series approach for environment-independent speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[20]  Khalid Satori,et al.  Speech Recognition for Moroccan Dialects: Feature Extraction and Classification Methods , 2019 .

[21]  Roberto Togneri,et al.  Perceptual features for automatic speech recognition in noisy environments , 2009, Speech Commun..

[22]  Richard M. Stern,et al.  Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction , 2009, INTERSPEECH.

[23]  Li Deng,et al.  High-performance robust speech recognition using stereo training data , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[24]  Fatima El Haoussi,et al.  Investigation Amazigh speech recognition using CMU tools , 2014, Int. J. Speech Technol..

[25]  Muhammad Ghulam,et al.  Noise Effect on Arabic Alphadigits in Automatic Speech Recognition , 2009, IPCV.

[26]  Hamidi Mohamed,et al.  Interactive Voice Response Server Voice Network Administration Using Hidden Markov Model Speech Recognition System , 2018, 2018 Second World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4).

[27]  Khalid Satori,et al.  Amazigh audiovisual speech recognition system design , 2017, 2017 Intelligent Systems and Computer Vision (ISCV).

[28]  James R. Glass,et al.  Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[30]  Jacob Benesty,et al.  Springer handbook of speech processing , 2007, Springer Handbooks.

[31]  Yifan Gong,et al.  A minimum-mean-square-error noise reduction algorithm on Mel-frequency cepstra for robust speech recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.