A Speech Act Classifier for Persian Texts and its Application in Identify Speech Act of Rumors

Speech Acts (SAs) are one of the important areas of pragmatics, which give us a better understanding of the state of mind of the people and convey an intended language function. Knowledge of the SA of a text can be helpful in analyzing that text in natural language processing applications. This study presents a dictionary-based statistical technique for Persian SA recognition. The proposed technique classifies a text into seven classes of SA based on four criteria: lexical, syntactic, semantic, and surface features. WordNet as the tool for extracting synonym and enriching features dictionary is utilized. To evaluate the proposed technique, we utilized four classification methods including Random Forest (RF), Support Vector Machine (SVM), Naive Bayes (NB), and K-Nearest Neighbors (KNN). The experimental results demonstrate that the proposed method using RF and SVM as the best classifiers achieved a state-of-the-art performance with an accuracy of 0.95 for classification of Persian SAs. Our original vision of this work is introducing an application of SA recognition on social media content, especially the common SA in rumors. Therefore, the proposed system utilized to determine the common SAs in rumors. The results showed that Persian rumors are often expressed in three SA classes including narrative, question, and threat, and in some cases with the request SA.

[1]  Khairullah Khan,et al.  A Review of Machine Learning Algorithms for Text-Documents Classification , 2010 .

[2]  M. Shamsfard,et al.  Augmenting FarsNet with New Relations and Structures for verbs , 2016, GWC.

[3]  J. O. Urmson,et al.  How to Do Things with Words@@@The William James Lectures , 1963 .

[4]  Farhad Oroumchian,et al.  Creating a Feasible Corpus for Persian POS Tagging , 2007 .

[5]  Wenjie Li,et al.  What Are Tweeters Doing: Recognizing Speech Acts in Twitter , 2011, Analyzing Microtext.

[6]  Nada Ghneim,et al.  Arabic Speech Act Recognition Techniques , 2018, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[7]  Chu-Ren Huang,et al.  Annotate and Identify Modalities, Speech Acts and Finer-Grained Event Types in Chinese Text , 2014, LG-LP@COLING.

[8]  Mohammad-Reza Feizi-Derakhshi,et al.  A Hybrid Approach for Persian Named Entity Recognition , 2015 .

[9]  Robert Sabourin,et al.  “One Against One” or “One Against All”: Which One is Better for Handwriting Recognition with SVMs? , 2006 .

[10]  Soroush Vosoughi,et al.  Tweet Acts: A Speech Act Classifier for Twitter , 2016, ICWSM.

[11]  J. Searle Expression and Meaning: A taxonomy of illocutionary acts , 1975 .

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  John R. Searle,et al.  Speech Acts: An Essay in the Philosophy of Language , 1970 .

[14]  Wenjie Li,et al.  Towards Scalable Speech Act Recognition in Twitter: Tackling Insufficient Training Data , 2012 .

[15]  Klaus Ries,et al.  HMM and neural network based speech act detection , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[16]  Jungyun Seo,et al.  A statistical prediction model of speakers' intentions using multi-level features in a goal-oriented dialog system , 2012, Pattern Recognit. Lett..

[17]  Homayounpour Mohammad Mehdi,et al.  SPEECH ACTS CLASSIFICATION OF PERSIAN LANGUAGE TEXTS USING THREE MACHINE LEARNING METHODS , 2010 .

[18]  J. O. Urmson,et al.  The William James Lectures , 1963 .

[19]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[20]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[21]  Ellen Riloff,et al.  Classifying Sentences as Speech Acts in Message Board Posts , 2011, EMNLP.

[22]  V Korde,et al.  TEXT CLASSIFICATION AND CLASSIFIERS: A SURVEY , 2012 .

[23]  Pavel Král,et al.  Automatic dialogue act recognition with syntactic features , 2014, Language Resources and Evaluation.