An experimental framework for Arabic digits speech recognition in noisy environments

In this paper we present an experimental framework for Arabic isolated digits speech recognition named ARADIGITS-2. This framework provides a performance evaluation of Modern Standard Arabic devoted to a Distributed Speech Recognition system, under noisy environments at various Signal-to-Noise Ratio (SNR) levels. The data preparation and the evaluation scripts are designed by deploying a similar methodology to that followed in AURORA-2 database. The original speech data contains a total of 2704 clean utterances, spoken by 112 (56 male and 56 female) Algerian native speakers, down-sampled at 8 kHz. The feature vectors, which consist of a set of Mel Frequency Cepstral Coefficients and log energy, are extracted from speech samples using ETSI Advanced Front-End (ETSI-AFE) standard; whereas, the Hidden Markov Models (HMMs) Toolkit is used for building the speech recognition engine. The recognition task is conducted in speaker-independent mode by considering both word and syllable as acoustic units. Therefore, an optimal fitting of HMM parameters, as well as the temporal derivatives window, is carried out through a series of experiments performed on the two training modes: clean and multi-condition. Better results are obtained by exploiting the polysyllabic nature of Arabic digits. These results show the effectiveness of syllable-like unit in building Arabic digits recognition system, which exceeds word-like unit by an overall Word Accuracy Rate of 0.44 and 0.58% for clean and multi-condition training modes, respectively.

[1]  Albino Nogueiras,et al.  OrienTel - Multilingual access to interactive communication services for the Mediterranean and the Middle East , 2002, LREC.

[2]  Satoshi Nakamura,et al.  Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments: newest Part of the CENSREC Series - , 2008, LREC.

[3]  Yousef Ajami Alotaibi High performance Arabic digits recognizer using neural networks , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[4]  Frank K. Soong,et al.  High performance connected digit recognition, using hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[5]  Herbert Gish,et al.  Parametric trajectory models for speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Joseph Picone,et al.  Syllable-based large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[7]  Xiaodong Cui,et al.  A Study of Variable-Parameter Gaussian Mixture Hidden Markov Modeling for Noisy Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Marwan Al-Zabibi An acoustic-phonetic approach in automatic arabic speech recognition , 1990 .

[9]  Abderrahmane Amrouche,et al.  An efficient speech recognition system in adverse conditions using the nonparametric regression , 2010, Eng. Appl. Artif. Intell..

[10]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[11]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[12]  DeLiang Wang,et al.  Segregation of unvoiced speech from nonspeech interference. , 2008, The Journal of the Acoustical Society of America.

[13]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[14]  Chiu-yu Tseng,et al.  An Efficient speech Recognition System for the initials of Mandarin syllables , 1990, Int. J. Pattern Recognit. Artif. Intell..

[15]  Darren Pearce,et al.  Enabling new speech driven services for mobile devices: An overview of the ETSI standards activities , 2000 .

[16]  David Pearce Developing the ETSI Aurora advanced distributed speech recognition front-end and what next? , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[17]  Steve Young,et al.  The HTK book , 1995 .

[18]  Aaron E. Rosenberg,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Satoshi Nakamura,et al.  AURORA-2J: An Evaluation Framework for Japanese Noisy Speech Recognition , 2005, IEICE Trans. Inf. Syst..

[20]  Nadine Hajj,et al.  Weighted entropy cortical algorithms for isolated Arabic speech recognition , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[21]  M Naveh-Benjamin,et al.  Digit Span, Reading Rate, and Linguistic Relativity , 1986, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[22]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[23]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition: Advanced Topics , 1999 .

[24]  Karin C. Ryding,et al.  A Reference Grammar of Modern Standard Arabic , 2005 .

[25]  Brian Hanson,et al.  Regression features for recognition of speech in quiet and in noise , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[26]  Raed Abu Zitar,et al.  Arabic speech recognition using SPHINX engine , 2006, Int. J. Speech Technol..

[27]  Yousef Ajami Alotaibi Investigating spoken Arabic digits in speech recognition setting , 2005, Inf. Sci..

[28]  Xiaoqin Zeng,et al.  An improved VQ based algorithm for recognizing speaker-independent isolated words , 2012, 2012 International Conference on Machine Learning and Cybernetics.

[29]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[30]  Saudi Arabia,et al.  Comparative Study of ANN and HMM to Arabic Digits Recognition Systems , 2008 .

[31]  Khalid Choukri,et al.  OrienTel – Arabic speech resources for the IT market , 2002 .

[32]  Raja Noor Ainon,et al.  Phonetically rich and balanced text and speech corpora for Arabic language , 2012, Lang. Resour. Evaluation.

[33]  Chin-Hui Lee,et al.  Acoustic modeling of subword units for speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[34]  Simão Ferraz de Campos Neto The ITU-T Software Tool Library , 1999, Int. J. Speech Technol..

[35]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[36]  Satoshi Nakamura,et al.  CENSREC-3: An Evaluation Framework for Japanese Speech Recognition in Real Car-Driving Environments , 2006, IEICE Trans. Inf. Syst..

[37]  R. Bakis Continuous speech recognition via centisecond acoustic states , 1976 .