Studying the History of Ideas Using Topic Models

How can the development of ideas in a scientific field be studied over time? We apply unsupervised topic modeling to the ACL Anthology to analyze historical trends in the field of Computational Linguistics from 1978 to 2006. We induce topic clusters using Latent Dirichlet Allocation, and examine the strength of each topic over time. Our methods find trends in the field including the rise of probabilistic methods starting in 1988, a steady increase in applications, and a sharp decline of research in semantics and understanding between 1978 and 2001, possibly rising again after 2001. We also introduce a model of the diversity of ideas, topic entropy, using it to show that COLING is a more diverse conference than ACL, but that both conferences as well as EMNLP are becoming broader over time. Finally, we apply Jensen-Shannon divergence of topic distributions to show that all three conferences are converging in the topics they cover.

[1]  T. Kuhn,et al.  The Structure of Scientific Revolutions. , 1964 .

[2]  Carl Lagoze,et al.  Detecting research topics via the correlation between graphs and texts , 2007, KDD '07.

[3]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[4]  Gideon S. Mann,et al.  Bibliometric impact measures leveraging topic analysis , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[5]  Dragomir R. Radev,et al.  Citation Analysis, Centrality, and the ACL Anthology , 2008 .

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  E GARFIELD,et al.  Citation indexes for science; a new dimension in documentation through association of ideas. , 2006, Science.

[8]  Kenneth Church Reviewing the Reviewers , 2005, Computational Linguistics.

[9]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[10]  B. V. Sukhotin Optimization algorithms of deciphering as the elements of a linguistic theory , 1988, COLING.

[11]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[12]  E. Garfield Citation indexes for science. A new dimension in documentation through association of ideas. 1955. , 1955, International journal of epidemiology.

[13]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[14]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Steffen Bickel,et al.  Unsupervised prediction of citation influences , 2007, ICML '07.