Human-In-The-Loop Topic Modelling: Assessing topic labelling and genre-topic relations with a movie plot summary corpus

A much-used but not yet mainstream text analysis approach, topic modelling allows the identification of lexical themes for a document collection. Against principles for interpretable AI and sociotechnical design, there are definite strengths from its speed and ability to discover structure, but remain challenges in how results can be interpreted whether this be by analysts, domain experts, or potential end users. Automated coherence and labelling measures go some of the way toward bridging the understanding and trust gap, and user empowerment through visualisation and design intervention is starting to show how the remaining ground might be made up. This study uses topic modelling on a corpus of Wikipedia movie summaries to illustrate challenges and potential. Topic labelling for naive users was found to only be easy in a quarter of cases, and difficulty increased markedly with 100 topics compared to 50. While automated measures suggested 88 topics, the number manageable by users was closer to 50. The unsupervised topic model was compared to the movie genre labels and indicated that the two might work together well to complement genres, match content across genre and highlight within-genre variability. It is suggested that unsupervised models might work better for creativity and discovery than semi-supervised versions.

[1]  Daniel A. Keim,et al.  Visual Analytics for Topic Model Optimization based on User-Steerable Speculative Execution , 2019, IEEE Transactions on Visualization and Computer Graphics.

[2]  Tim Menzies,et al.  What is wrong with topic modeling? And how to fix it using search-based software engineering , 2016, Inf. Softw. Technol..

[3]  Carina Jacobi,et al.  Quantitative analysis of large amounts of journalistic texts using topic modelling , 2016, Rethinking Research Methods in an Age of Digital Journalism.

[4]  Peter Murrell,et al.  Toward Understanding 17th Century English Culture: A Structural Topic Model of Francis Bacon's Ideas , 2018, Journal of Comparative Economics.

[5]  Niklas Elmqvist,et al.  The human touch: How non-expert users perceive, interpret, and fix topic models , 2017, Int. J. Hum. Comput. Stud..

[6]  Margaret E. Roberts,et al.  The structural topic model and applied social science , 2013, ICONIP 2013.

[7]  V. Brezina Statistical choices in corpus-based discourse analysis , 2018 .

[8]  Xiaodong Xu,et al.  Discovering Symptom-herb Relationship by Exploiting SHT Topic Model , 2017 .

[9]  Ali Faisal,et al.  Establishing Video Game Genres Using Data-Driven Modeling and Product Databases , 2018, Games Cult..

[10]  Pilsung Kang,et al.  Identifying core topics in technology and innovation management studies: a topic model approach , 2018 .

[11]  David M. Mimno,et al.  Applications of Topic Models , 2017, Found. Trends Inf. Retr..

[12]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[13]  Jeffrey Heer,et al.  Termite: visualization techniques for assessing textual topic models , 2012, AVI.

[14]  Huan Liu,et al.  In Search of Coherence and Consensus: Measuring the Interpretability of Statistical Topics , 2018, J. Mach. Learn. Res..

[15]  Kenneth E. Shirley,et al.  LDAvis: A method for visualizing and interpreting topics , 2014 .

[16]  Arindam Banerjee,et al.  DAPPER: Scaling Dynamic Author Persona Topic Model to Billion Word Corpora , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[17]  Brendan T. O'Connor,et al.  Learning Latent Personas of Film Characters , 2013, ACL.

[18]  Changqin Quan,et al.  Examining Accumulated Emotional Traits in Suicide Blogs With an Emotion Topic Model , 2016, IEEE Journal of Biomedical and Health Informatics.

[19]  Michael Röder,et al.  Exploring the Space of Topic Coherence Measures , 2015, WSDM.

[20]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[21]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[22]  Margaret E. Roberts,et al.  stm: An R Package for Structural Topic Models , 2019, Journal of Statistical Software.

[23]  Bhuva Narayan,et al.  Interactive Topic Modeling for aiding Qualitative Content Analysis , 2016, CHIIR.

[24]  William Speier,et al.  Evaluating topic model interpretability from a primary care physician perspective , 2016, Comput. Methods Programs Biomed..

[25]  Shimei Pan,et al.  LDAExplore: Visualizing Topic Models Generated Using Latent Dirichlet Allocation , 2015, ArXiv.

[26]  C W Clegg,et al.  Sociotechnical principles for system design. , 2000, Applied ergonomics.

[27]  L. A. S. Hamón,et al.  A Probabilistic Topic Model on Energy and Transportation Sustainability Perceptions Within Spanish University Students. , 2016 .

[28]  Anne Hurault-Paupe Images on the Move : Circulations and Transfers in film The paradoxes of cinematic movement : is the road movie a static genre ? , 2018 .

[29]  Neville A Stanton,et al.  Designing sociotechnical systems with cognitive work analysis: putting theory back into practice , 2015, Ergonomics.

[30]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[31]  Derek Greene,et al.  How Many Topics? Stability Analysis for Topic Models , 2014, ECML/PKDD.

[32]  Timothy Baldwin,et al.  Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality , 2014, EACL.

[33]  Il-Chul Moon,et al.  Identifying prescription patterns with a topic model of diseases and medications , 2017, J. Biomed. Informatics.