论文信息 - Comparative evaluation of two arabic speech corpora

Comparative evaluation of two arabic speech corpora

The aim of this paper is to conduct a constructive and comparative evaluation between two important Arabic corpora for two different Arabic dialects, namely, Saudi dialect corpus that was collected by King Abdulaziz City for Science and Technology (KACST), and a Levantine Arabic dialect corpus. Levantine dialect is spoken by ordinary Lebanese, Jordanian, Syrian, and Palestinian people. The later one was produced by the Linguistic Data Consortium (LDC). Advantages and disadvantages of these two corpora were presented and discussed. This discussion is aiming to help digital speech processing researchers to figure out the weakness and strength sides of these important corpora before considering them in their experiments. Moreover, this paper can motivate in designing, maintaining, distributing, and upgrading Arabic corpora to help Arabic language speech research communities.

Yousef Ajami Alotaibi | Ali Hamid Meftah

[1] Mansour Al-Ghamdi,et al. Speaker verification based on Saudi accented Arabic database , 2007, 2007 9th International Symposium on Signal Processing and Its Applications.

[2] H. Hashamdar,et al. Application of Trujillo Algorithm for Analyzing of Structures , 2010 .

[3] Fayez A. Alhargan,et al. Saudi accented Arabic voice bank , 2008, ExLing.

[4] Muhammad Ghulam,et al. Arabic Speaker Recognition: Babylon Levantine Subset Case Study , 2010 .

[5] Pooja Agrawal,et al. Segmentation of Handwritten Hindi Text: A Structural Approach , 2009, Int. J. Comput. Process. Orient. Lang..

[6] Y. Alotaibi,et al. USING A TELEPHONY SAUDI ACCENTED ARABIC CORPUS IN AUTOMATIC RECOGNITION OF SPOKEN ARABIC DIGITS , 2008 .

[7] Sid-Ahmed Selouani,et al. Evaluating the MSA West Point Speech Corpus , 2009, Int. J. Comput. Process. Orient. Lang..

[8] Hassan Satori,et al. Introduction to Arabic Speech Recognition Using CMUSphinx System , 2007, ArXiv.