Towards an Hybrid Approach for Semantic Arabic Spontaneous Speech Analysis

The automatic speech understanding aims to extract the useful meaning of the oral utterances. In this paper, we propose a hybrid original method for a robust automatic Arabic speech understanding. The proposed method combines two approaches usually used separately and not considered as complementary. This hybridization has the advantage of being robust while coping with irregularities of oral language such as the non-fixed order of words, selfcorrections, repetitions, false departures which are called disfluencies. Through such a combination, we can also overcome structuring sentence complexities in Arabic language itself like the use of conditional, concession, emphatic, negation and elliptical forms. We provide, in this work a detailed description of our approach as well as results compared with several systems using different approaches separately. The observed error rates suggest that our combined approach can stand a comparison with concept spotters on larger application domains. We also present, our corpus, inspired from MEDIA and LUNA project corpora, collected with the Wizard of Oz method. This corpus deals with the touristic Arabic information and hotel reservation. The evaluation results of our hybrid spontaneous speech analysis method are very encouraging. Indeed, the obtained rate of F-Measure is 79.98%.

[1]  Jean-Yves Antoine,et al.  Logical Approach to Natural Language Understanding in a Spoken Dialogue System , 2004, TSD.

[2]  Manny Rayner,et al.  Comparing grammar-based and robust approaches to speech understanding: a case study , 2001, INTERSPEECH.

[3]  Marie-Jean Meurs,et al.  Approche stochastique bayésienne de la composition sémantique pour les modules de compréhension automatique de la parole dans les systèmes de dialogue homme-machine. (A Bayesian Approach of Semantic Composition for Spoken Language Understanding Modules in Spoken Dialog Systems) , 2009 .

[4]  Georges Antoniadis,et al.  Compréhension automatique de la parole arabe spontanée , 2008 .

[5]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[6]  Marcel Cori Des méthodes de traitement automatique aux linguistiques fondées sur les corpus , 2008 .

[7]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[8]  Nadine Vigouroux,et al.  Traitement automatique de disfluences dans un corpus linguistiquement contraint , 2009 .

[9]  Tom Brøndsted The linguistic components of the REWARD dialogue creation environment and run time system , 1998, Proceedings 1998 IEEE 4th Workshop Interactive Voice Technology for Telecommunications Applications. IVTTA '98 (Cat. No.98TH8376).

[10]  Ghina Besbes Modélisation de dialogues à l'aide d'un modèle Markovien caché , 2010 .

[11]  Mounir Zrigui,et al.  A Combined Method Based on Stochastic and Linguistic Paradigm for the Understanding of Arabic Spontaneous Utterances , 2013, CICLing.

[12]  Jean-Yves Antoine,et al.  LOGUS : compréhension de l'oral spontané. Présentation et évaluation des bases formelles de LOGUS , 2004, Rev. d'Intelligence Artif..

[13]  Jean Berstel,et al.  Context-Free Languages and Pushdown Automata , 1997, Handbook of Formal Languages.

[14]  Frédéric Béchet,et al.  Constitution d'un corpus de dialogue oral pour l'évaluation automatique de la compréhension hors- et en- contexte du dialogue , 2004 .

[15]  Renato De Mori,et al.  Spoken language interpretation: On the use of dynamic Bayesian networks for semantic composition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Rémi Bove,et al.  A Tagged Corpus-Based Study for Repeats and Self-repairs Detection in French Transcribed Speech , 2008, TSD.

[17]  Gabriel G. Bès,et al.  French Unification Categorial Grammars , 1992 .

[18]  Jérôme Goulian,et al.  Quand le TAL robuste s’attaque au langage parlé : analyse incrémentale pour la compréhension de la parole spontanée , 2003, JEPTALNRECITAL.

[19]  Mohamed-Zakaria Kurdi,et al.  Contribution a l'analyse du langage oral spontané , 2003 .

[20]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[21]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[22]  Anna Maria Di Sciullo,et al.  Natural Language Understanding , 2009, SoMeT.

[23]  Philippe Blache Chunks et activation : un modèle de facilitation du traitement linguistique , 2013 .

[24]  François Trouilleux Un analyseur de surface non déterministe pour le français , 2009, JEPTALNRECITAL.

[25]  Stephanie Seneff,et al.  TINA: A Natural Language System for Spoken Language Applications , 1992, Comput. Linguistics.