'What is this corpus about?': Using topic modelling to explore a specialised corpus

This paper introduces topic modelling, a machine learning technique that automatically identifies ‘topics’ in a given corpus. The paper illustrates its use in the exploration of a corpus of academi...

[1]  David M. Blei,et al.  Surveying a suite of algorithms that offer a solution to managing large document archives. , 2012 .

[2]  D. Blei,et al.  Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of U.S. government arts funding , 2013 .

[3]  Matthew L. Jockers,et al.  Significant themes in 19th-century literature , 2013 .

[4]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[5]  Martin Warren Identifying aboutgrams in engineering texts , 2010 .

[6]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[7]  Douglas Biber,et al.  Variation across speech and writing: Methodology , 1988 .

[8]  Martin Ponweiser,et al.  Latent Dirichlet Allocation in R , 2012 .

[9]  Ken Hyland,et al.  Stance and engagement: a model of interaction in academic discourse , 2005 .

[10]  Stefan Thomas Gries,et al.  Statistics for linguistics with R: A practical introduction (review) , 2012 .

[11]  Emily A. Marshall Defining population problems: Using topic models for cross-national comparison of disciplinary development , 2013 .

[12]  Clare R. Voss,et al.  Scalable Topical Phrase Mining from Text Corpora , 2014, Proc. VLDB Endow..

[13]  M. Bondi Perspectives on keywords and keyness: an introduction , 2010 .

[14]  John Sinclair,et al.  Corpus, Concordance, Collocation , 1991 .

[15]  Lisa Rhody Topic Modeling and Figurative Language , 2012 .

[16]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[17]  Kurt Hornik,et al.  topicmodels : An R Package for Fitting Topic Models , 2016 .

[18]  Winnie Cheng,et al.  From n-gram to skipgram to concgram , 2006 .

[19]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[20]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[21]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[22]  Geoffrey Williams Collocational networks: Interlocking patterns of lexis in a Corpusof plant biology research articles , 1998 .

[23]  Alan Y. Chiang,et al.  Generalized Additive Models: An Introduction With R , 2007, Technometrics.

[24]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Winnie Cheng,et al.  Uncovering the Extent of the Phraseological Tendency: Towards a Systematic Analysis of Concgrams , 2009 .

[26]  Joseph L. Austerweil,et al.  Analyzing the history of Cognition using Topic Models , 2015, Cognition.

[27]  Justin Grimmer,et al.  A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases , 2010, Political Analysis.

[28]  Tony McEnery,et al.  Collocations in context:a new perspective on collocation networks , 2015 .

[29]  Paul Rayson,et al.  From key words to key semantic domains , 2008 .

[30]  Geoffrey Williams In search of representativity in specialised corpora: Categorisation through collocation , 2002 .

[31]  Mike Scott Problems in investigating keyness, or clearing the undergrowth and marking out trails… , 2010 .

[32]  Douglas Biber,et al.  Dimensions of Register Variation: A Cross-Linguistic Comparison , 1995 .