Investigating spoken Arabic digits in speech recognition setting

Arabic language is a Semitic language that has many differences when compared to European languages such as English. One of these differences is how to pronounce the 10 digits, zero through nine. Except for zero, all Arabic digits are polysyllabic words. In this paper Arabic digits were investigated from the speech recognition problem point of view. An artificial neural network based speech recognition system was designed and tested with automatic Arabic digit recognition. The system is an isolated whole word speech recognizer and it was implemented as both a multi-speaker and speaker-independent modes. During the recognition process, noise was removed from digitized speech by means of band-pass filters, the signal was also pre-emphasized, and windowed and blocked by Hamming window. A time alignment algorithm was used to compensate for differences in utterance lengths and misalignments between phonemes. Frame features were extracted by using MFCC coefficients to reduce the amount of the information in the input signal. Finally the neural network classified the unknown digit. This recognition system achieved a 99.5% correct digit recognition in the multi-speaker mode, and 94.5% in speaker-independent mode. This paper also investigated Arabic digits as ''patterns on paper'' by using spectrogram and waveform information to cross check and investigate digit recognition system results and to try to locate the causes of miss-recognized digits. All Arabic digits were described by showing their constructing phonemes and syllables. Comparisons of all possible pairs of digits were also investigated and comments were stated with links to digit recognition system output. An understanding of the causes of automatic digit recognition system errors may help in building digit recognition systems that are simple, cheap, and fast.

[1]  Ting Hua Nong,et al.  Classification of Malay speech sounds based on place of articulation and voicing using neural networks , 2001, Proceedings of IEEE Region 10 International Conference on Electrical and Electronic Technology. TENCON 2001 (Cat. No.01CH37239).

[2]  Richard Lippmann,et al.  Review of Neural Networks for Speech Recognition , 1989, Neural Computation.

[3]  Waleed H. Abdulla,et al.  Real-time spoken Arabic digit recognizer , 1985 .

[4]  Geoffrey K. Pullum,et al.  Phonetic Symbol Guide , 1998 .

[5]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[6]  Douglas D. O'Shaughnessy,et al.  Hybrid architectures for complex phonetic features classification: a unified approach , 2001, Proceedings of the Sixth International Symposium on Signal Processing and its Applications (Cat.No.01EX467).

[7]  Marwan Al-Zabibi An acoustic-phonetic approach in automatic arabic speech recognition , 1990 .

[8]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[9]  Sheikh Hussain Shaikh Salleh,et al.  Neural network speaker dependent isolated Malay speech recognition system: handcrafted vs genetic algorithm , 2001, Proceedings of the Sixth International Symposium on Signal Processing and its Applications (Cat.No.01EX467).

[10]  Lawrence R. Rabiner,et al.  An algorithm for determining the endpoints of isolated utterances , 1975, Bell Syst. Tech. J..

[11]  Yousif A. El-Imam An unrestricted vocabulary Arabic speech synthesis system , 1989, IEEE Trans. Acoust. Speech Signal Process..

[12]  Elias Mehretab Hagos Implementation of an isolated word recognition system , 1985 .

[13]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[14]  Andreas Spanias,et al.  High-performance alphabet recognition , 1996, IEEE Trans. Speech Audio Process..