Syllable-based automatic arabic speech recognition in noisy-telephone channel

The performance of well-trained speech recognizers using high quality full bandwidth speech data is usually degraded when used in real world environments. In particular, telephone speech recognition is extremely difficult due to the limited bandwidth of transmission channels. In this paper, we concentrate on the telephone recognition of Egyptian Arabic speech using syllables. Arabic spoken digits were described by showing their constructing phonemes, triphones, syllables and words. Speaker-independent hidden markov models (HMMs)-based speech recognition system was designed using Hidden markov model toolkit (HTK). The database used for both training and testing consists from forty-four Egyptian speakers. In clean environment, experiments show that the recognition rate using syllables outperformed the rate obtained using monophones, triphones and words by 2.68%, 1.19% and 1.79% respectively. Also in noisy telephone channel, syllables outperformed the rate obtained using monophones, triphones and words by 2.09%, 1.5% and 0.9% respectively. Comparative experiments have indicated that the use of syllables as acoustic units leads to an improvement in the recognition performance of HMM-based ASR systems in noisy environments. A syllable unit spans a longer time frame, typically three phones, thereby offering a more parsimonious framework for modeling pronunciation variation in spontaneous speech. Moreover, syllable-based recognition has relatively smaller number of used units and runs faster than word-based recognition.

[1]  Martha Larson,et al.  Sub-word-based language models for speech recognition : implications for spoken document retrieval , 2001 .

[2]  Douglas D. O'Shaughnessy,et al.  Speech communication : human and machine , 1987 .

[3]  Marwan Al-Zabibi An acoustic-phonetic approach in automatic arabic speech recognition , 1990 .

[4]  Joseph Picone,et al.  Syllable-based large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[5]  Hsiao-Wuen Hon,et al.  Vocabulary-independent speech recognition: the Vocind System , 1992 .

[6]  Dimitra Vergyri,et al.  Automatic Diacritization of Arabic for Acoustic Modeling in Speech Recognition , 2004 .

[7]  Katrin Kirchhoff Syllable-level desynchronisation of phonetic features for speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8]  P. Ladefoged A course in phonetics , 1975 .

[9]  Andreas Stolcke,et al.  Morphology-based language modeling for arabic speech recognition , 2004, INTERSPEECH.

[10]  Jilei Tian Data-driven approaches for automatic detection of syllable boundaries , 2004, INTERSPEECH.

[11]  Steven Greenberg,et al.  Incorporating information from syllable-length time scales into automatic speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[12]  Yousif A. El-Imam An unrestricted vocabulary Arabic speech synthesis system , 1989, IEEE Trans. Acoust. Speech Signal Process..