Scalable inference of topic evolution via models for latent geometric structures

We develop new models and algorithms for learning the temporal dynamics of the topic polytopes and related geometric objects that arise in topic model based inference. Our model is nonparametric Bayesian and the corresponding inference algorithm is able to discover new topics as the time progresses. By exploiting the connection between the modeling of topic polytope evolution, Beta-Bernoulli process and the Hungarian matching algorithm, our method is shown to be several orders of magnitude faster than existing topic modeling approaches, as demonstrated by experiments working with several million documents in under two dozens of minutes.

[1]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[2]  Inderjit S. Dhillon,et al.  Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[3]  Jonathan P. How,et al.  Streaming, Distributed Variational Inference for Bayesian Nonparametrics , 2015, NIPS.

[4]  Chong Wang,et al.  Online Variational Inference for the Hierarchical Dirichlet Process , 2011, AISTATS.

[5]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[6]  Chong Wang,et al.  The IBP Compound Dirichlet Process and its Application to Focused Topic Modeling , 2010, ICML.

[7]  Bryan Silverthorn,et al.  Spherical Topic Models , 2010, ICML.

[8]  XuanLong Nguyen,et al.  Dirichlet Simplex Nest and Geometric Inference , 2019, ICML.

[9]  XuanLong Nguyen,et al.  Conic Scan-and-Cover algorithms for nonparametric topic modeling , 2017, NIPS.

[10]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[11]  XuanLong Nguyen,et al.  Posterior contraction of the population polytope in finite admixture models , 2012, ArXiv.

[12]  Yee Whye Teh,et al.  Stick-breaking Construction for the Indian Buffet Process , 2007, AISTATS.

[13]  Yasaman Khazaeni,et al.  Bayesian Nonparametric Federated Learning of Neural Networks , 2019, ICML.

[14]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[15]  XuanLong Nguyen,et al.  Geometric Dirichlet Means Algorithm for topic inference , 2016, NIPS.

[16]  Kristjan H. Greenewald,et al.  Statistical Model Aggregation via Parameter Matching , 2019, NeurIPS.

[17]  Max Welling,et al.  Distributed Inference for Latent Dirichlet Allocation , 2007, NIPS.

[18]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[19]  Eric P. Xing,et al.  Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream , 2010, UAI.

[20]  XuanLong Nguyen,et al.  Inference of global clusters from locally distributed data , 2010, ArXiv.

[21]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[22]  Liangjie Hong,et al.  A time-dependent topic model for multiple text streams , 2011, KDD.

[23]  Andre Wibisono,et al.  Streaming Variational Bayes , 2013, NIPS.

[24]  Michael I. Jordan,et al.  Sharing Features among Dynamical Systems with Beta Processes , 2009, NIPS.

[25]  Erik B. Sudderth,et al.  Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes , 2012, NIPS.

[26]  Qiaozhu Mei,et al.  Understanding the Limiting Factors of Topic Modeling via Posterior Contraction Analysis , 2014, ICML.

[27]  Thomas L. Griffiths,et al.  The Indian Buffet Process: An Introduction and Review , 2011, J. Mach. Learn. Res..

[28]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[29]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: dominant markers and null alleles , 2007, Molecular ecology notes.

[30]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[31]  Jun Zhu,et al.  Scaling up Dynamic Topic Models , 2016, WWW.

[32]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[33]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..