Thematic Analysis: A Corpus-Based Method for Understanding Themes/Topics of a Corpus through a Classification Process Using Long Short-Term Memory (LSTM)

Using advanced algorithms to conduct a thematic analysis reduces the time taken and increases the efficiency of the analysis. Long short-term memory (LSTM) is effective in the field of text classification and natural language processing (NLP). In this study, we adopt LSTM for text classification in order to perform a thematic analysis using concordance lines that are taken from a corpora of news articles. However, the statistical and quantitative analyses of corpus linguistics are not enough to fully identify the semantic shift of terms and concepts. Therefore, we suggest that a corpus should be classified from a linguistic theoretical perspective, as this would help to determine the level of the linguistic patterns that should be applied in the experiment of the classification process. We suggest investigating the concordance lines of the articles rather than only the relationship between collocates, as this has been a limitation for many studies. The findings of this research work highlight the effectiveness of the proposed methodology for the thematic analysis of media coverage, reaching 84% accuracy. This method provides a deeper thematic analysis than only applying the classification process through the collocational analysis.

[1]  Ahmed Al Hamoud,et al.  Sentence subjectivity analysis of a political and ideological debate dataset using LSTM and BiLSTM with attention and GRU models , 2022, J. King Saud Univ. Comput. Inf. Sci..

[2]  R. Ng,et al.  Diversity of COVID-19 News Media Coverage across 17 Countries: The Influence of Cultural Values, Government Stringency and Pandemic Severity , 2021, International journal of environmental research and public health.

[3]  Yanming Sun,et al.  Regional Manufacturing Industry Demand Forecasting: A Deep Learning Approach , 2021, Applied Sciences.

[4]  Mike Thelwall,et al.  Male, Female, and Nonbinary Differences in UK Twitter Self-descriptions: A Fine-grained Systematic Exploration , 2021, J. Data Inf. Sci..

[5]  Noor Fatima,et al.  Enhancing Performance of a Deep Neural Network: A Comparative Analysis of Optimization Algorithms , 2020, ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal.

[6]  G. S. N. Murthy,et al.  Text based Sentiment Analysis using LSTM , 2020, International Journal of Engineering Research and.

[7]  Rita Orji,et al.  Deep Sentiment Classification and Topic Discovery on Novel Coronavirus or COVID-19 Online Discussions: NLP Using LSTM Recurrent Neural Network Approach , 2020, bioRxiv.

[8]  Yongjun Hu,et al.  Textual Analysis for Online Reviews: A Polymerization Topic Sentiment Model , 2019, IEEE Access.

[9]  Yongjun Hu,et al.  Online Sales Prediction: An Analysis With Dependency SCOR-Topic Sentiment Model , 2019, IEEE Access.

[10]  Shuang Li,et al.  Online Prediction of Ship Behavior with Automatic Identification System Sensor Data Using Bidirectional Long Short-Term Memory Recurrent Neural Network , 2018, Sensors.

[11]  Nihar M. Ranjan,et al.  Document Classification using LSTM Neural Network , 2017 .

[12]  Tony McEnery,et al.  Collocations in Corpus‐Based Language Learning Research: Identifying, Comparing, and Interpreting the Evidence , 2017 .

[13]  Sylviane Granger,et al.  Formulaic Language in Learner Corpora , 2012, Annual Review of Applied Linguistics.

[14]  J. Renkema Discourse, of course : an overview of research in discourse studies , 2009 .

[15]  Anke Lüdeling,et al.  Corpus Linguistics: An International Handbook , 2009 .

[16]  Nadja Nesselhauf,et al.  Collocations in native and non-native speaker language , 2005 .

[17]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[18]  N. Bondarchuk,et al.  Keyword-based Study of Thematic Vocabulary in British Weather News , 2022, COLINS.

[19]  José Viterbo Filho,et al.  A Model Based on LSTM Neural Networks to Identify Five Different Types of Malware , 2019, KES.

[20]  Tony McEnery,et al.  Collocations in context:a new perspective on collocation networks , 2015 .

[21]  N. Schmitt,et al.  How much collocation knowledge do L2 learners have?: the effects of frequency and amount of exposure , 2015 .

[22]  Paul Baker Using Corpora in Discourse Analysis , 2006 .