Co-word Maps and Topic Modeling: A Comparison from a User's Perspective

Induced by “big data,” “topic modeling” has become an attractive alternative to mapping cowords in terms of co-occurrences and co-absences using network techniques. We return to the word/document matrix using first a single text with a strong argument (“The Leiden Manifesto”) and then upscale to a sample of moderate size (n = 687) to study the pros and cons of the two approaches in terms of the resulting possibilities for making semantic maps that can serve an argument. The results from co-word mapping (using two different routines) versus topic modeling are significantly uncorrelated. Whereas components in the co-word maps can easily be designated, the coloring of the nodes according to the results of the topic model provides maps that are difficult to interpret. In these samples, the topic models seem to reveal similarities other than semantic ones (e.g., linguistic ones). In other words, topic modeling does not replace coword mapping.

[1]  Robert Wall,et al.  Reading tea leaves , 2007 .

[2]  Vladimir Batagelj,et al.  Exploratory social network analysis with Pajek. - 2nd ed. , 2011 .

[3]  Ludo Waltman,et al.  Text mining and visualization using VOSviewer , 2011, ArXiv.

[4]  Christopher Thornhill,et al.  NIKLAS LUHMANN , 2006, Luhmann and Law.

[5]  Roel Popping,et al.  Computer-assisted text analysis , 2000 .

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  Satoru Kawai,et al.  An Algorithm for Drawing General Undirected Graphs , 1989, Inf. Process. Lett..

[8]  William Rasch,et al.  Theories of Distinction: Redescribing the Descriptions of Modernity , 2002 .

[9]  Carl W. Roberts,et al.  Text analysis for the social sciences : methods for drawing statistical inferences from texts and transcripts , 1997 .

[10]  Petra Kaufmann Revolutions And Reconstructions In The Philosophy Of Science , 2016 .

[11]  Justin Grimmer,et al.  Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts , 2013, Political Analysis.

[12]  Loet Leydesdorff Why words and co‐words cannot map the development of the sciences , 1997 .

[13]  S. Rijcke,et al.  Bibliometrics: The Leiden Manifesto for research metrics. , 2015, Nature.

[14]  Loet Leydesdorff Words and co-words as indicators of intellectual organization , 1989 .

[15]  Richard Rorty,et al.  The Linguistic turn : essays in philosophical method , 1992 .

[16]  Thomas S. Kuhn Scientific development and lexical change , 1984 .

[17]  Loet Leydesdorff,et al.  A validation study of “LEXIMAPPE” , 1992, Scientometrics.

[18]  Paul Hoyningen-Huene,et al.  The Road Since 'Structure': Philosophical Essays, 1970-1993, with an Autobiographical Interview , 2002 .

[19]  Arie Rip,et al.  Qualitative conditions of scientometrics: The new challenges , 2006, Scientometrics.

[20]  Mehran Sahami,et al.  Text Mining: Classification, Clustering, and Applications , 2009 .

[21]  Loet Leydesdorff,et al.  The semantic mapping of words and co-words in contexts , 2010, J. Informetrics.

[22]  Henry G. Small,et al.  Mapping the dynamics of science and technology , 1988, Scientometrics.

[23]  C. Elkan,et al.  Topic Models , 2008 .

[24]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[25]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Leah G. Nichols A topic model approach to measuring interdisciplinarity at the National Science Foundation , 2014, Scientometrics.

[27]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[28]  Ed C. M. Noyons,et al.  Automatic term identification for bibliometric mapping , 2008, Scientometrics.

[29]  Carina Jacobi,et al.  Quantitative analysis of large amounts of journalistic texts using topic modelling , 2016, Rethinking Research Methods in an Age of Digital Journalism.

[30]  John Law,et al.  Science For Social Scientists , 1984 .

[31]  N. Luhmann,et al.  The Cognitive Program of Constructivism and a Reality that Remains Unknown , 1990 .

[32]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[33]  M. Callon,et al.  From translations to problematic networks: An introduction to co-word analysis , 1983 .

[34]  Andrew McCallum,et al.  Rethinking LDA: Why Priors Matter , 2009, NIPS.

[35]  John A. Barnden,et al.  Semantic Networks , 1998, Encyclopedia of Social Network Analysis and Mining.

[36]  F. Krauss Latent Structure Analysis , 1980 .

[37]  W. Underwood What Can Topic models of PMLA Teach Us About the History of Literary Scholarship , 2012 .

[38]  Arie Rip,et al.  Co-word maps of biotechnology: An example of cognitive scientometrics , 1984, Scientometrics.

[39]  Ronald Rousseau,et al.  Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient , 2003, J. Assoc. Inf. Sci. Technol..

[40]  Loet Leydesdorff,et al.  Measuring the meaning of words in contexts: An automated analysis of controversies about 'Monarch butterflies,' 'Frankenfoods,' and 'stem cells' , 2006, Scientometrics.

[41]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[42]  C. Waltz Validation study. , 1988, NLN publications.

[43]  Petko Bogdanov,et al.  Introduction—Topic models: What they are and why they matter , 2013 .

[44]  Willard Van Orman Quine,et al.  Word and Object , 1960 .

[45]  Kathleen M. Carley,et al.  AutoMap User's Guide 2011 , 2011 .

[46]  Loet Leydesdorff,et al.  Content Analysis and the Measurement of Meaning: The Visualization of Frames in Collections of Messages , 2011 .

[47]  K. Krippendorff,et al.  The Content Analysis Reader , 2008 .

[48]  Thed N. van Leeuwen,et al.  The Leiden ranking 2011/2012: Data collection, indicators, and interpretation , 2012, J. Assoc. Inf. Sci. Technol..

[49]  Kathleen M. Carley,et al.  ORA User's Guide 2011 , 2011 .

[50]  Ismael Rafols,et al.  Content-based and algorithmic classifications of journals: Perspectives on the dynamics of scientific communication and indexer effects , 2009 .

[51]  Anthony F. J. van Raan,et al.  Mapping co-word structures: A comparison of multidimensional scaling and leximappe , 1989, Scientometrics.

[52]  L. Leydesdorff Scientific Communication and Cognitive Codification , 2007, 0911.2717.

[53]  Mark Blaug The Road since Structure: Philosophical Essays, 1970-1993, with an Autobiographical Interview, and: For and against Method: Including Lakatos's Lectures on Scientific Method and the Lakatos-Feyerabend Correspondence (review) , 2001 .