Faces, Fights, and Families: Topic Modeling and Gendered Themes in Two Corpora of Swedish Prose Fiction

This paper explores topic modeling (TM) as a tool for “distant reading” of two Swedish literary corpora. We investigate what kinds of insight and knowledge a TM-based approach can provide to Swedish literary history, and which methodological difficulties are associated with this endeavour. The TM is based on 12and 24-term chunks of selected verb and common noun lemmas. We generate models with 20, 40, and 100 topics. We also propose a method for a quantitative and qualitative gendered thematic analysis by combining TM with a study of how the topics relate to gender in characters and authors. The two corpora contain, respectively, Swedish classics (1821–1941) and recent bestsellers (2004–2017). We find that most of the topics proposed by the TM are easy to interpret as conceptual themes, and that the “same” themes appear for the two corpora and for different TM settings. The study allows us to make interesting observations concerning different aspects of gender and topic distribution.

[1]  Max Welling,et al.  Distributed Algorithms for Topic Models , 2009, J. Mach. Learn. Res..

[2]  Alan Liu Where Is Cultural Criticism in the Digital Humanities , 2012 .

[3]  Borja Navarro-Colorado On Poetic Topic Modeling: Extracting Themes and Motifs From a Corpus of Spanish Poetry , 2018, Front. Digit. Humanit..

[4]  Robert Östling,et al.  Stagger: an Open-Source Part of Speech Tagger for Swedish , 2013 .

[5]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[6]  Timothy R. Tangherlini,et al.  Trawling in the Sea of the Great Unread: Sub-corpus topic modeling and Humanities research , 2013 .

[7]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[8]  Fredrik Norén Information som lösning, information som problem : En digital läsning av tusentals statliga utredningar , 2016 .

[9]  T. Underwood Theorizing Research Practices We Forgot to Theorize Twenty Years Ago , 2014 .

[10]  Clovis Gladstone,et al.  Discourses and Disciplines in the Enlightenment: Topic Modeling the French Encyclopédie , 2016, Front. Digit. Humanit..

[11]  Matthew L. Jockers,et al.  Significant themes in 19th-century literature , 2013 .

[13]  Matthew L. Jockers Macroanalysis: Digital Methods and Literary History , 2013 .

[14]  L. Mandell Gendering Digital Literary History , 2015 .

[15]  Fredrik Norén,et al.  Distant reading the history of Swedish film politics in 4500 governmental SOU reports , 2017 .

[16]  Christof Schöch,et al.  Topic Modeling Genre: An Exploration of French Classical and Enlightenment Drama , 2015, Digit. Humanit. Q..

[17]  Lisa Rhody Topic Modeling and Figurative Language , 2012 .

[18]  T. Underwood,et al.  The Quiet Transformations of Literary Studies: What Thirteen Thousand Scholars Could Tell Us , 2014 .

[19]  Andrew McCallum,et al.  Efficient methods for topic model inference on streaming document collections , 2009, KDD.

[20]  Arian Barakat What makes an (audio)book popular , 2018 .

[21]  tara mcpherson,et al.  Why Are the Digital Humanities So White? or Thinking the Histories of Race and Computation , 2013 .

[22]  Franco Moretti,et al.  Conjectures on world literature , 2016 .