Interactive Topic Modeling for aiding Qualitative Content Analysis

Topic Modeling algorithms are rarely used to support the qualitative content analysis process. The main contributing factors for the lack of mainstream adoption can be attributed to the perception that Topic Modeling produces topics of poor quality and that content analysts do not trust the derived topics because they are unable to supply domain knowledge and interact with the algorithm. In this paper, interactive Topic Modeling algorithms namely Dirichlet-Forrest Latent Dirichlet Allocation and Penalised Non-negative Matrix Factorisation, are evaluated with respect to their ability to aid qualitative content analysis. More specifically, the relationship between interactivity, interpretation, topic coherence and trust in interactive content analysis is examined. The findings indicate that providing content analysts with the ability to interact with Topic Modeling algorithms produces topics that are directly related to their research questions. However, a number of improvements to these algorithms were also identified which have the potential to influence future algorithm development to better meet the requirements of qualitative content analysts.

[1]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[2]  Hsiu-Fang Hsieh,et al.  Three Approaches to Qualitative Content Analysis , 2005, Qualitative health research.

[3]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[4]  Ian Davidson,et al.  Constrained Clustering: Advances in Algorithms, Theory, and Applications , 2008 .

[5]  Timothy Baldwin,et al.  Evaluating topic models for digital libraries , 2010, JCDL '10.

[6]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Fei Wang,et al.  Semi-Supervised Clustering via Matrix Factorization , 2008, SDM.

[8]  T. Muhr ATLAS/ti — A prototype for the support of text interpretation , 1991 .

[9]  Aneesha Bakharia Interactive content analysis : evaluating interactive variants of non-negative Matrix Factorisation and Latent Dirichlet Allocation as qualitative content analysis aids , 2014 .

[10]  Thomas S. Huang,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation. , 2011, IEEE transactions on pattern analysis and machine intelligence.

[11]  K. Krippendorff,et al.  The Content Analysis Reader , 2008 .

[12]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[13]  Xiaojun Wu,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  N. Denzin,et al.  The SAGE handbook of qualitative research , 2005 .

[15]  Xiaojin Zhu,et al.  Incorporating domain knowledge into topic modeling via Dirichlet Forest priors , 2009, ICML '09.

[16]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[17]  Xiaojin Zhu,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence A Framework for Incorporating General Domain Knowledge into Latent Dirichlet Allocation Using First-Order Logic , 2022 .