Topic Browsing for Research Papers with Hierarchical Latent Tree Analysis

Academic researchers often need to face with a large collection of research papers in the literature. This problem may be even worse for postgraduate students who are new to a field and may not know where to start. To address this problem, we have developed an online catalog of research papers where the papers have been automatically categorized by a topic model. The catalog contains 7719 papers from the proceedings of two artificial intelligence conferences from 2000 to 2015. Rather than the commonly used Latent Dirichlet Allocation, we use a recently proposed method called hierarchical latent tree analysis for topic modeling. The resulting topic model contains a hierarchy of topics so that users can browse the topics from the top level to the bottom level. The topic model contains a manageable number of general topics at the top level and allows thousands of fine-grained topics at the bottom level. It also can detect topics that have emerged recently.

[1]  Mark Dredze,et al.  Topic Models and Metadata for Visualizing Text Corpora , 2013, NAACL.

[2]  Thomas L. Griffiths,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[3]  Ke Deng,et al.  On the unsupervised analysis of domain-specific Chinese texts , 2016, Proceedings of the National Academy of Sciences.

[4]  Tengfei Liu,et al.  Hierarchical Latent Tree Analysis for Topic Detection , 2014, ECML/PKDD.

[5]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  I JordanMichael,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2010 .

[8]  Chong Wang,et al.  Nested Hierarchical Dirichlet Processes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Tengfei Liu,et al.  Greedy learning of latent tree models for multidimensional clustering , 2013, Machine Learning.

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[12]  Leonard K. M. Poon,et al.  Progressive EM for Latent Tree Models and Hierarchical Topic Detection , 2015, AAAI.

[13]  Alison Smith,et al.  erarchie: Interactive Visualization for Hierarchical Topic Models , 2014 .

[14]  Alison Smith,et al.  Hiearchie: Visualization for Hierarchical Topic Models , 2014 .

[15]  Tao Chen,et al.  Model-based multidimensional clustering of categorical data , 2012, Artif. Intell..

[16]  Matt Gardner The Topic Browser An Interactive Tool for Browsing Topic Models , 2010 .

[17]  Kenneth E. Shirley,et al.  LDAvis: A method for visualizing and interpreting topics , 2014 .

[18]  David M. Blei,et al.  Visualizing Topic Models , 2012, ICWSM.

[19]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[20]  Nevin Lianwen Zhang,et al.  Hierarchical latent class models for cluster analysis , 2002, J. Mach. Learn. Res..

[21]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.