Turn Segmentation into Utterances for Arabic Spontaneous Dialogues and Instance Messages

Text segmentation task is an essential processing task for many of Natural Language Processing (NLP) such as text summarization, text translation, dialogue language understanding, among others. Turns segmentation considered the key player in dialogue understanding task for building automatic Human-Computer systems. In this paper, we introduce a novel approach to turn segmentation into utterances for Egyptian spontaneous dialogues and Instance Messages (IM) using Machine Learning (ML) approach as a part of automatic understanding Egyptian spontaneous dialogues and IM task. Due to the lack of Egyptian dialect dialogue corpus the system evaluated by our corpus includes 3001 turns, which are collected, segmented, and annotated manually from Egyptian call-centers. The system achieves F1 scores of 90.74% and accuracy of 95.98%.

[1]  Lamia Hadrich Belguith,et al.  Discriminative Framework for Spoken Tunisian Dialect Understanding , 2013, SLSP.

[2]  Lamia Hadrich Belguith,et al.  Clause-based Discourse Segmentation of Arabic Texts , 2012, LREC.

[3]  AbdelRahim A. Elmadany,et al.  Arabic Inquiry-Answer Dialogue Acts Annotation Schema , 2014, ArXiv.

[4]  Chris Callison-Burch,et al.  Arabic Dialect Identification , 2014, CL.

[5]  Yuji Matsumoto,et al.  Chunking with Support Vector Machines , 2001, NAACL.

[6]  Mounir Zrigui,et al.  A Combined Method Based on Stochastic and Linguistic Paradigm for the Understanding of Arabic Spontaneous Utterances , 2013, CICLing.

[7]  Yuji Matsumoto,et al.  Use of Support Vector Learning for Chunk Identification , 2000, CoNLL/LLL.

[8]  Edward Ivanovic,et al.  Automatic Utterance Segmentation in Instant Messaging Dialogue , 2005, ALTA.

[9]  Zuhair Bandar,et al.  User's utterance classification using machine learning for Arabic Conversational Agents , 2013, 2013 5th International Conference on Computer Science and Information Technology.

[10]  Arthur C. Graesser,et al.  CLASIFICACIÓN AUTOMáTICA DE ACTOS DEL HABLA EN áRABE* AuToMATED SPEECh ACT CLASSIFICATIoN IN ARAbIC , 2010 .

[11]  I. Khalifa,et al.  Arabic Discourse Segmentation Based on Rhetorical Methods , 2013 .

[12]  Elizabeth Shriberg,et al.  Automatic dialog act segmentation and classification in multiparty meetings , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[13]  Nick Webb,et al.  Cue-based dialogue act classification , 2010 .

[14]  Andreas Stolcke,et al.  Joint Segmentation and Classification of Dialog Acts in Multiparty Meetings , 2005, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[15]  Daniel Jurafsky,et al.  Support Vector Learning for Semantic Argument Classification , 2005, Machine Learning.

[16]  Zuhair Bandar,et al.  ArabChat: An Arabic Conversational Agent , 2014, 2014 6th International Conference on Computer Science and Information Technology (CSIT).

[17]  AbdelRahim A. Elmadany,et al.  RECENT APPROACHES TO ARABIC DIALOGUE ACTS CLASSIFICATIONS , 2015 .

[18]  Ding Liu,et al.  Utterance Segmentation Using Combined Approach Based on Bi-directional N-gram and Maximum Entropy , 2003, SIGHAN.

[19]  Nizar Habash,et al.  MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic , 2014, LREC.

[20]  Hassan Mathkour,et al.  Semantic-Based Segmentation of Arabic Texts , 2008 .

[21]  Khaled Shaalan,et al.  A Novel Hybrid Approach to Arabic Named Entity Recognition , 2014 .

[22]  Slim Abdennadher,et al.  Survey on common Arabic language forms from a speech recognition point of view , 2009 .

[23]  David R. Traum,et al.  Utterance Units in Spoken Dialogue , 1996, ECAI Workshop on Dialogue Processing in Spoken Language Systems.