Genre analysis of movies using a topic model of plot summaries

Genre plays an important role in the description, navigation, and discovery of movies, but it is rarely studied at large scale using quantitative methods. This allows an analysis of how genre labels are applied, how genres are composed and how these ingredients change, and how genres compare. We apply unsupervised topic modeling to a large collection of textual movie summaries and then use the model's topic proportions to investigate key questions in genre, including recognizability, mapping, canonicity, and change over time. We find that many genres can be quite easily predicted by their lexical signatures and this defines their position on the genre landscape. We find significant genre composition changes between periods for westerns, science fiction and road movies, reflecting changes in production and consumption values. We show that in terms of canonicity, canonical examples are often at the high end of the topic distribution profile for the genre rather than central as might be predicted by categorization theory.

[1]  John D. Lafferty,et al.  Correction: A correlated topic model of Science , 2007, 0712.1486.

[2]  W. Underwood Genre Theory and Historicism , 2016 .

[3]  Brendan T. O'Connor,et al.  Learning Latent Personas of Film Characters , 2013, ACL.

[4]  Arindam Banerjee,et al.  DAPPER: Scaling Dynamic Author Persona Topic Model to Billion Word Corpora , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[5]  Hadley Wickham US Baby Names 1880-2017 [R package babynames version 1.0.0] , 2019 .

[6]  Franca Garzotto,et al.  Recommending Movies Based on Mise-en-Scene Design , 2016, CHI Extended Abstracts.

[7]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[8]  B. Grant Film Genre , 2023 .

[9]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[10]  Matthias Abt,et al.  Genre An Introduction To History Theory Research And Pedagogy , 2016 .

[11]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[12]  J. Mohr,et al.  Formal studies of culture: Issues, challenges, and current trends , 2018, Poetics.

[13]  Dirk Geeraerts,et al.  Prospects and problems of prototype theory , 2016 .

[14]  Toby Miller,et al.  Questions of Genre , 2008 .

[15]  Tim Menzies,et al.  What is wrong with topic modeling? And how to fix it using search-based software engineering , 2016, Inf. Softw. Technol..

[16]  Brandon Chao,et al.  Automated Movie Genre Classification with LDA-based Topic Modeling , 2016 .

[17]  R. Altman,et al.  A Semantic/Syntactic Approach To Film Genre , 1984 .

[18]  Andrew MacFarlane Knowledge Organisation and its Role in Multimedia Information Retrieval , 2016 .

[19]  Franco Moretti Graphs, Maps, Trees: Abstract Models for a Literary History , 2005 .

[20]  Anne Hurault-Paupe Images on the Move : Circulations and Transfers in film The paradoxes of cinematic movement : is the road movie a static genre ? , 2018 .

[21]  Rich Gazan,et al.  First-Mover Advantage in a Social Q&A Community , 2015, 2015 48th Hawaii International Conference on System Sciences.

[22]  Amanda Henrichs Deforming Shakespeare's Sonnets: Topic Models as Poems , 2019, Criticism.

[23]  William E Underwood,et al.  The Life Cycles of Genres , 2016 .

[24]  Daniel S. Weld George Lakoff, Women, Fire, and Dangerous Things , 1988, Artificial Intelligence.

[25]  M. Jancovich,et al.  The shifting definitions of genre: essays on labeling films, television shows and media , 2008 .

[26]  Wai Chee Dimock,et al.  Introduction: Genres as Fields of Knowledge , 2007, PMLA/Publications of the Modern Language Association of America.

[27]  Benjamin M. Schmidt Do Digital Humanists Need to Understand Algorithms , 2016 .

[28]  Not a Gay Cowboy Movie? , 2009 .

[29]  Jeannett Martin,et al.  Genre Relations: Mapping Culture , 2008 .

[30]  G. Lakoff Women, fire, and dangerous things : what categories reveal about the mind , 1989 .

[31]  D. Geeraerts,et al.  Introduction: Prospects and problems of prototype theory , 1989 .

[32]  Paul Martin Eve Close Reading with Computers: Textual Scholarship, Computational Formalism, and David Mitchell's Cloud Atlas , 2019 .

[33]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[34]  Margaret E. Roberts,et al.  The structural topic model and applied social science , 2013, ICONIP 2013.

[35]  Pinar Senkul,et al.  Movie Genre Classification from Plot Summaries Using Bidirectional LSTM , 2018, 2018 IEEE 12th International Conference on Semantic Computing (ICSC).

[36]  Simon J. Evnine “But Is It Science Fiction?”: Science Fiction and a Theory of Genre , 2015 .