Automatic Labelling of Topics with Neural Embeddings

Topics generated by topic models are typically represented as list of terms. To reduce the cognitive overhead of interpreting these topics for end-users, we propose labelling a topic with a succinct phrase that summarises its theme or idea. Using Wikipedia document titles as label candidates, we compute neural embeddings for documents and words to select the most relevant labels for topics. Comparing to a state-of-the-art topic labelling system, our methodology is simpler, more efficient and finds better topic labels.

[1]  Yang Song,et al.  Topical Keyphrase Extraction from Twitter , 2011, ACL.

[2]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[3]  Jordan L. Boyd-Graber,et al.  ALTO: Active Learning with Topic Overviews for Speeding Label Induction and Document Labeling , 2016, ACL.

[4]  Derek Greene,et al.  Unsupervised graph-based topic labelling using dbpedia , 2013, WSDM.

[5]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[6]  Timothy Baldwin,et al.  Representing topics labels for exploring digital libraries , 2014, IEEE/ACM Joint Conference on Digital Libraries.

[7]  ChengXiang Zhai,et al.  Automatic labeling of multinomial topic models , 2007, KDD '07.

[8]  Mark Stevenson,et al.  Representing Topics Using Images , 2013, HLT-NAACL.

[9]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[10]  Timothy Baldwin,et al.  An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation , 2016, Rep4NLP@ACL.

[11]  Timothy Baldwin,et al.  Automatic Labelling of Topic Models Using Word Vectors and Letter Trigram Vectors , 2015, AIRS.

[12]  Daniel Barbará,et al.  On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[13]  Daniel Jurafsky,et al.  Studying the History of Ideas Using Topic Models , 2008, EMNLP.

[14]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[15]  Niklas Elmqvist,et al.  Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Topic Labels , 2017, TACL.

[16]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[17]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[18]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[19]  Timothy Baldwin,et al.  Automatic Labelling of Topic Models , 2011, ACL.

[20]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[21]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[22]  Hugo Larochelle,et al.  A Neural Autoregressive Topic Model , 2012, NIPS.

[23]  Timothy Baldwin,et al.  Using ontological and document similarity to estimate museum exhibit relatedness , 2011, JOCCH.

[24]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[25]  Heng Ji,et al.  A Novel Neural Topic Model and Its Supervised Extension , 2015, AAAI.