Deep Sentiment Analysis: A Case Study on Stemmed Turkish Twitter Data

Sentiment analysis using stemmed Twitter data from various languages is an emerging research topic. In this paper, we address three data augmentation techniques namely Shift, Shuffle, and Hybrid to increase the size of the training data; and then we use three key types of deep learning (DL) models namely recurrent neural network (RNN), convolution neural network (CNN), and hierarchical attention network (HAN) to classify the stemmed Turkish Twitter data for sentiment analysis. The performance of these DL models has been compared with the existing traditional machine learning (TML) models. The performance of TML models has been affected negatively by the stemmed data, but the performance of DL models has been improved greatly with the utilization of the augmentation techniques. Based on the simulation, experimental, and statistical results analysis deeming identical datasets, it has been concluded that the TML models outperform the DL models with respect to both training-time (TTM) and runtime (RTM) complexities of the algorithms; but the DL models outperform the TML models with respect to the most important performance factors as well as the average performance rankings.

[1]  Martin M. Antony,et al.  Associations Between Social Anxiety, Depression, and Use of Mobile Dating Applications , 2020, Cyberpsychology Behav. Soc. Netw..

[2]  Junwei Zhou,et al.  Arabic Sentiment Classification Using Convolutional Neural Network and Differential Evolution Algorithm , 2019, Comput. Intell. Neurosci..

[3]  Samar Al-Saqqa,et al.  Stemming Effects on Sentiment Analysis using Large Arabic Multi-Domain Resources , 2019, 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS).

[4]  Bing Liu,et al.  Character-level text classification via convolutional neural network and gated recurrent unit , 2020, Int. J. Mach. Learn. Cybern..

[5]  Esfandiar Eslami,et al.  Solving the twitter sentiment analysis problem based on a machine learning-based approach , 2020, Evol. Intell..

[6]  Prasenjit Majumder,et al.  Statistical vs. Rule-Based Stemming for Monolingual French Retrieval , 2006, CLEF.

[7]  Chen Li,et al.  Multi-source social media data sentiment analysis using bidirectional recurrent convolutional neural networks , 2020, Comput. Commun..

[8]  Buyue Qian,et al.  Stacked Residual Recurrent Neural Networks With Cross-Layer Attention for Text Classification , 2020, IEEE Access.

[9]  Mario Andrés Paredes-Valverde,et al.  Sentiment Analysis in Spanish for Improvement of Products and Services: A Deep Learning Approach , 2017, Sci. Program..

[10]  Zhiwen Yu,et al.  Fusion of heterogeneous attention mechanisms in multi-view convolutional neural network for text classification , 2021, Inf. Sci..

[11]  G. Hommel,et al.  Improvements of General Multiple Test Procedures for Redundant Systems of Hypotheses , 1988 .

[12]  Shang Gao,et al.  Hierarchical attention networks for information extraction from cancer pathology reports , 2017, J. Am. Medical Informatics Assoc..

[13]  Mauro Conti,et al.  AppScanner: Automatic Fingerprinting of Smartphone Apps from Encrypted Network Traffic , 2016, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[14]  Kiplagat Wilfred Kiprono,et al.  Comparative Twitter Sentiment Analysis Based on Linear and Probabilistic Models , 2016 .

[15]  B. Holland,et al.  An Improved Sequentially Rejective Bonferroni Test Procedure , 1987 .

[16]  H. A. Shehu,et al.  A Hybrid Approach for the Sentiment Analysis of Turkish Twitter Data , 2019 .

[17]  Rabie A. Ramadan,et al.  Sentiment Analysis of Turkish Twitter Data Using Polarity Lexicon and Artificial Intelligence , 2020 .

[18]  Md. Haidar Sharif A numerical approach for tracking unknown number of individual targets in videos , 2016, Digit. Signal Process..

[19]  Mladen Berekovic,et al.  Static Allocation of Basic Blocks Based on Runtime and Memory Requirements in Embedded Real-Time Systems with Hierarchical Memory Layout , 2021, NG-RES@HiPEAC.

[20]  D. Rom A sequentially rejective test procedure based on a modified Bonferroni inequality , 1990 .

[21]  Nicole Gruber,et al.  Are GRU Cells More Specific and LSTM Cells More Sensitive in Motive Classification of Text? , 2020, Frontiers in Artificial Intelligence.

[22]  Giuseppe Aceto,et al.  MIMETIC: Mobile encrypted traffic classification using multimodal deep learning , 2019, Comput. Networks.

[23]  José Augusto Baranauskas,et al.  How Many Trees in a Random Forest? , 2012, MLDM.

[24]  Ismail Hakki Toroslu,et al.  Sentiment Analysis of Turkish Political News , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[25]  Jiangtao Ren,et al.  Gated recurrent neural network with sentimental relations for sentiment classification , 2019, Inf. Sci..

[26]  Éva Csató Johanson,et al.  The Turkic Languages , 1998 .

[27]  Martin Braschler,et al.  How Effective is Stemming and Decompounding for German Text Retrieval? , 2004, Information Retrieval.

[28]  Seher Arslankaya,et al.  Sentiment Analysis of Shared Tweets on Global Warming on Twitter with Data Mining Methods: A Case Study on Turkish Language , 2020, Comput. Intell. Neurosci..

[29]  Chabane Djeraba,et al.  An entropy approach for abnormal activities detection in video streams , 2012, Pattern Recognit..

[30]  Imtiaz Hussain Khan,et al.  A Cooperative Binary-Clustering Framework Based on Majority Voting for Twitter Sentiment Analysis , 2020, IEEE Access.

[31]  Narseo Vallina-Rodriguez,et al.  Haystack: In Situ Mobile Traffic Analysis in User Space , 2015, ArXiv.

[32]  Nooraini Yusoff,et al.  Sentiment Analysis of Impact of Technology on Employment from Text on Twitter , 2020, Int. J. Interact. Mob. Technol..

[33]  Rafal Scherer,et al.  LSTM Recurrent Neural Networks for Short Text and Sentiment Classification , 2017, ICAISC.

[34]  Jacques Savoy,et al.  Light stemming approaches for the French, Portuguese, German and Hungarian languages , 2006, SAC.

[35]  Rao Muhammad Adeel Nawab,et al.  Deep sentiments in Roman Urdu text using Recurrent Convolutional Neural Network model , 2020, Inf. Process. Manag..

[36]  Rabie A. Ramadan,et al.  Distributed Mutual Exclusion Algorithms for Intersection Traffic Problems , 2020, IEEE Access.

[37]  Mehdi Emadi,et al.  Twitter sentiment analysis using fuzzy integral classifier fusion , 2019, J. Inf. Sci..

[38]  Ali Feizollah,et al.  Halal Products on Twitter: Data Extraction and Sentiment Analysis Using Stack of Deep Learning Algorithms , 2019, IEEE Access.

[39]  Turgay Çelik,et al.  A Reduced Uncertainty-Based Hybrid Evolutionary Algorithm for Solving Dynamic Shortest-Path Routing Problem , 2015, J. Circuits Syst. Comput..

[40]  Wadee Alhalabi,et al.  Hybrid Approach for Sentiment Analysis of Twitter Posts Using a Dictionary-based Approach and Fuzzy Logic Methods: Study Case on Cloud Service Providers , 2020, Int. J. Semantic Web Inf. Syst..

[41]  Martin Braschler,et al.  Stemming and Decompounding for German Text Retrieval , 2003, ECIR.

[42]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[43]  Cheng Liu,et al.  Quality-related English text classification based on recurrent neural network , 2020, J. Vis. Commun. Image Represent..

[44]  Tinghuai Ma,et al.  The Impact of Weighting Schemes and Stemming Process on Topic Modeling of Arabic Long and Short Texts , 2020, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[45]  Aqil M. Azmi,et al.  Impact of Stemming and Word Embedding on Deep Learning-Based Arabic Text Categorization , 2020, IEEE Access.

[46]  Lishuang Li,et al.  Hierarchical Attention Based Position-Aware Network for Aspect-Level Sentiment Analysis , 2018, CoNLL.

[47]  Mike Thelwall,et al.  Sentiment strength detection for the social web , 2012, J. Assoc. Inf. Sci. Technol..

[48]  Berkant Barla Cambazoglu,et al.  A Framework for Sentiment Analysis in Turkish: Application to Polarity Detection of Movie Reviews in Turkish , 2012, ISCIS.

[49]  Jaime Lloret,et al.  Network Traffic Classifier With Convolutional and Recurrent Neural Networks for Internet of Things , 2017, IEEE Access.

[50]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[51]  S. Salma Begum,et al.  Combining optimal wavelet statistical texture and recurrent neural network for tumour detection and classification over MRI , 2020, Multimedia Tools and Applications.

[52]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[53]  Tunga Güngör,et al.  Combination of Recursive and Recurrent Neural Networks for Aspect-Based Sentiment Analysis Using Inter-Aspect Relations , 2020, IEEE Access.

[54]  Zbigniew Telec,et al.  Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms , 2012, Int. J. Appl. Math. Comput. Sci..

[55]  Y. Hochberg A sharper Bonferroni procedure for multiple tests of significance , 1988 .

[56]  Wladmir Cardoso Brandão,et al.  Assessing the Efficiency of Suffix Stripping Approaches for Portuguese Stemming , 2015, SPIRE.

[57]  D. Quade Using Weighted Rankings in the Analysis of Complete Blocks with Additive Block Effects , 1979 .

[58]  Ram Mohana Reddy Guddeti,et al.  Influence factor based opinion mining of Twitter data using supervised learning , 2014, 2014 Sixth International Conference on Communication Systems and Networks (COMSNETS).

[59]  Qing Li,et al.  Combining weighted category-aware contextual information in convolutional neural networks for text classification , 2020, World Wide Web.

[60]  G Hommel,et al.  A rapid algorithm and a computer program for multiple test procedures using logical structures of hypotheses. , 1994, Computer methods and programs in biomedicine.

[61]  Adil Alpkocak,et al.  TREMO: A dataset for emotion analysis in Turkish , 2018, J. Inf. Sci..

[62]  Reda Alhajj,et al.  Emotion and sentiment analysis from Twitter text , 2019, J. Comput. Sci..

[63]  J. L. Hodges,et al.  Rank Methods for Combination of Independent Experiments in Analysis of Variance , 1962 .

[64]  Qingfeng Du,et al.  On the Interpretation of Convolutional Neural Networks for Text Classification , 2020, ECAI.

[65]  Jing Yang,et al.  Twitter Sentiment Analysis Based on Ordinal Regression , 2019, IEEE Access.

[66]  Shang Gao,et al.  Hierarchical Convolutional Attention Networks for Text Classification , 2018, Rep4NLP@ACL.

[67]  Alfonso Medina Urrea Towards the Automatic Lemmatization of 16th Century Mexican Spanish: A Stemming Scheme for the CHEM , 2006, CICLing.

[68]  Noah A. Smith,et al.  A Formal Hierarchy of RNN Architectures , 2020, ACL.

[69]  Irwin King,et al.  Aspect-level Sentiment Classification with HEAT (HiErarchical ATtention) Network , 2017, CIKM.

[70]  Student,et al.  THE PROBABLE ERROR OF A MEAN , 1908 .

[71]  Akshi Kumar,et al.  Systematic literature review of sentiment analysis on Twitter using soft computing techniques , 2019, Concurr. Comput. Pract. Exp..

[72]  Hajar Rehioui,et al.  New Clustering Algorithms for Twitter Sentiment Analysis , 2020, IEEE Systems Journal.

[73]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[74]  Chenquan Gan,et al.  Multi-entity sentiment analysis using self-attention based hierarchical dilated convolutional neural network , 2020, Future Gener. Comput. Syst..

[75]  Mustafa Çagatayli,et al.  The Effect of Stemming and Stop-Word-Removal on Automatic Text Classification in Turkish Language , 2015, ICONIP.

[76]  Matthew England,et al.  Arabic language sentiment analysis on health services , 2017, 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR).

[77]  Arzucan Özgür,et al.  Analyzing Stemming Approaches for Turkish Multi-Document Summarization , 2014, EMNLP.

[78]  Hiroshi Masuichi,et al.  The Japanese lexical transducer based on stem-suffix style forms , 1996, Nat. Lang. Eng..

[79]  Yücel Saygin,et al.  SteM at SemEval-2016 Task 4: Applying Active Learning to Improve Sentiment Classification , 2016, SemEval@NAACL-HLT.

[80]  S. S. Young,et al.  Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[81]  Manuela Rodríguez-Luna Stemming Process in Spanish Words with the Successor Variety Method. Methodology and Result , 2002, ICEIS.

[82]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[83]  T. Munich,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[84]  Antonio Pescapè,et al.  Multi-classification approaches for classifying mobile app traffic , 2018, J. Netw. Comput. Appl..

[85]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[86]  Sahin Uyaver,et al.  Sentiment analysis of Turkish Twitter data , 2019 .

[87]  Min Dong,et al.  Variable Convolution and Pooling Convolutional Neural Network for Text Sentiment Classification , 2020, IEEE Access.

[88]  Stéphane Bressan,et al.  Automatic Learning of Stemming Rules for the Indonesian Language , 2003, PACLIC.

[89]  Onder Coban,et al.  Sentiment analysis for Turkish Twitter feeds , 2015, 2015 23nd Signal Processing and Communications Applications Conference (SIU).

[90]  Dongdong Zhao,et al.  A Study of the Effects of Stemming Strategies on Arabic Document Classification , 2019, IEEE Access.

[91]  Katarzyna Musial,et al.  Transformer based Deep Intelligent Contextual Embedding for Twitter sentiment analysis , 2020, Future Gener. Comput. Syst..

[92]  Nozomu Togawa,et al.  Document-Level Sentiment Classification in Japanese by Stem-Based Segmentation with Category and Data-Source Information , 2020, 2020 IEEE 14th International Conference on Semantic Computing (ICSC).

[93]  Ema Utami,et al.  Non-formal affixed word stemming in Indonesian language , 2018, 2018 International Conference on Information and Communications Technology (ICOIACT).

[94]  Hayri Sever,et al.  FindStem: Analysis and Evaluation of a Turkish Stemming Algorithm , 2003, SPIRE.

[95]  Jianxun Liu,et al.  CE-HEAT: An Aspect-Level Sentiment Classification Approach With Collaborative Extraction Hierarchical Attention Network , 2019, IEEE Access.

[96]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[97]  Sotiris Ioannidis,et al.  A survey of Twitter research: Data model, graph structure, sentiment analysis and attacks , 2021, Expert Syst. Appl..

[98]  Axel Hunger,et al.  High-performance computing of 1/√x i and exp(±x i ) for a vector of inputs xi on Alpha and IA-64 CPUs , 2008 .

[99]  Jianjun Li A two-step rejection procedure for testing multiple hypotheses , 2008 .

[100]  Gonenc Ercan,et al.  Sentiment classification on Turkish hotel reviews , 2016, 2016 24th Signal Processing and Communication Application Conference (SIU).

[101]  Xiaoying Gao,et al.  Ontology-Guided Data Augmentation for Medical Document Classification , 2020, AIME.

[102]  G. Hommel A stagewise rejective multiple test procedure based on a modified Bonferroni test , 1988 .

[103]  Jacques Savoy,et al.  Stemming of French Words Based on Grammatical Categories , 1993, J. Am. Soc. Inf. Sci..

[104]  Md. Haidar Sharif High-Performance Mathematical Functions for Single-Core Architectures , 2014, J. Circuits Syst. Comput..

[105]  Fazeel Abid,et al.  Social media sentiment analysis through parallel dilated convolutional neural network for smart city applications , 2020, Comput. Commun..

[106]  J. Shaffer Modified Sequentially Rejective Multiple Test Procedures , 1986 .

[107]  Fathi M. Salem,et al.  Gate-variants of Gated Recurrent Unit (GRU) neural networks , 2017, 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS).

[108]  Vidhyacharan Bhaskar,et al.  Big data analytics for disaster response and recovery through sentiment analysis , 2018, Int. J. Inf. Manag..

[109]  Haizhou Du,et al.  Hierarchical Gated Convolutional Networks with Multi-Head Attention for Text Classification , 2018, 2018 5th International Conference on Systems and Informatics (ICSAI).

[110]  Xiaohui Yu,et al.  Sentence-Level Sentiment Analysis in the Presence of Modalities , 2014, CICLing.

[111]  Rita Orji,et al.  Deep Sentiment Classification and Topic Discovery on Novel Coronavirus or COVID-19 Online Discussions: NLP Using LSTM Recurrent Neural Network Approach , 2020, bioRxiv.

[112]  Shanfeng Zhu,et al.  HAXMLNet: Hierarchical Attention Network for Extreme Multi-Label Text Classification , 2019, ArXiv.

[113]  Ismail Hakki Toroslu,et al.  Transfer Learning Using Twitter Data for Improving Sentiment Classification of Turkish Political News , 2013, ISCIS.

[114]  Ting Liu,et al.  Document Modeling with Gated Recurrent Neural Network for Sentiment Classification , 2015, EMNLP.

[115]  Carlos G. Figuerola,et al.  Stemming and n-grams in Spanish: an evaluation of their impact on information retrieval , 2000, J. Inf. Sci..

[116]  M. de Rijke,et al.  Four Stemmers and a Funeral: Stemming in Hungarian at CLEF 2005 , 2005, CLEF.

[117]  Wang Bing,et al.  Adding Prior Knowledge in Hierarchical Attention Neural Network for Cross Domain Sentiment Classification , 2019, IEEE Access.

[118]  Hugh E. Williams,et al.  Stemming Indonesian , 2005, ACSC.

[119]  Cheng Zhao,et al.  Feature-Based Fusion Adversarial Recurrent Neural Networks for Text Sentiment Classification , 2019, IEEE Access.

[120]  Qi Li,et al.  Improving convolutional neural network for text classification by recursive data pruning , 2020, Neurocomputing.

[121]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[122]  Hayri Sever,et al.  Developing Turkish sentiment lexicon for sentiment analysis using online news media , 2016, 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA).

[123]  Alex Sherstinsky,et al.  Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network , 2018, Physica D: Nonlinear Phenomena.

[124]  R. Iman,et al.  Approximations of the critical region of the fbietkan statistic , 1980 .

[125]  Lei Huang,et al.  Text Classification Research with Attention-based Recurrent Neural Networks , 2018, Int. J. Comput. Commun. Control.

[126]  H. Finner On a Monotonicity Problem in Step-Down Multiple Test Procedures , 1993 .

[127]  Fazli Can,et al.  Information retrieval on Turkish texts , 2008, J. Assoc. Inf. Sci. Technol..

[128]  Md. Haidar Sharif An Eigenvalue Approach to Detect Flows and Events in Crowd Videos , 2017, J. Circuits Syst. Comput..

[129]  Jiaqi Wang,et al.  Hierarchical Attention Generative Adversarial Networks for Cross-domain Sentiment Classification , 2019, ArXiv.

[130]  Péter Halácsy,et al.  Benefits of Resource-Based Stemming in Hungarian Information Retrieval , 2006, CLEF.

[131]  Enhong Chen,et al.  Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network Approach , 2019, CIKM.

[132]  Vijay D. Katkar,et al.  Sentiments analysis of Twitter data using data mining , 2015, 2015 International Conference on Information Processing (ICIP).

[133]  Liang Zhao,et al.  Large-Scale Text Classification Using Scope-Based Convolutional Neural Network: A Deep Learning Approach , 2019, IEEE Access.