Multiclass Event Classification from Text

Social media has become one of the most popular sources of information. People communicate with each other and share their ideas, commenting on global issues and events in a multilingual environment. While social media has been popular for several years, recently, it has given an exponential rise in online data volumes because of the increasing popularity of local languages on the web. This allows researchers of the NLP community to exploit the richness of different languages while overcoming the challenges posed by these languages. Urdu is also one of the most used local languages being used on social media. In this paper, we presented the first-ever event detection approach for Urdu language text. Multiclass event classification is performed by popular deep learning (DL) models, i.e.,Convolution Neural Network (CNN), Recurrence Neural Network (RNN), and Deep Neural Network (DNN). The one-hot-encoding, word embedding, and term-frequency inverse document frequency- (TF-IDF-) based feature vectors are used to evaluate the Deep Learning(DL) models. The dataset that is used for experimental work consists of more than 0.15 million (103965) labeled sentences. DNN classifier has achieved a promising accuracy of 84% in extracting and classifying the events in the Urdu language script.

[1]  Abbas Raza Ali,et al.  Urdu text classification , 2009, FIT.

[2]  I. Maqsood,et al.  Random Forests and Decision Trees , 2012 .

[3]  Els Lefever,et al.  Economic Event Detection in Company-Specific News Text , 2018, ECONLP@ACL.

[4]  Yongli Zhang,et al.  Support Vector Machine Classification Algorithm and Its Application , 2012, ICICA.

[5]  Qaiser Abbas,et al.  Comparative Study of Feature Selection Approaches for Urdu Text Categorization , 2015 .

[6]  Yurong Zhong,et al.  The analysis of cases based on decision tree , 2016, 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS).

[7]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[8]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[9]  Stefan Savage,et al.  Measuring Online Service Availability Using Twitter , 2010, WOSN.

[10]  Ing. Jǐŕı Kléma Event Detection from Text Data , 2017 .

[11]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[12]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[13]  Ausif Mahmood,et al.  Deep learning for sentence classification , 2017, 2017 IEEE Long Island Systems, Applications and Technology Conference (LISAT).

[14]  U. Pal,et al.  Recognition of printed Urdu script , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[15]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[16]  Ondřej Bojar,et al.  Urdu Monolingual Corpus , 2014 .

[17]  Wei Yin,et al.  A Short Text Classification Approach with Event Detection and Conceptual Information , 2020, Proceedings of the 2020 5th International Conference on Machine Learning Technologies.

[18]  Philipp Cimiano,et al.  Event-based classification of social media streams , 2012, ICMR.

[19]  Xiaoyan Zhu,et al.  Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory , 2017, AAAI.

[20]  Yaxin Bi,et al.  KNN Model-Based Approach in Classification , 2003, OTM.

[21]  Daryl Essam,et al.  Sentiment Analysis System for Roman Urdu , 2018 .

[22]  CLE Urdu Books N-grams , 2014 .

[23]  Samar Haider,et al.  Urdu Word Embeddings , 2018, LREC.

[24]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[25]  Takuya Nakamura,et al.  A Risk Assessment System with Automatic Extraction of Event Types , 2008, Intelligent Information Processing.