Sparsemax and Relaxed Wasserstein for Topic Sparsity

Topic sparsity refers to the observation that individual documents usually focus on several salient topics instead of covering a wide variety of topics, and a real topic adopts a narrow range of terms instead of a wide coverage of the vocabulary. Understanding this topic sparsity is especially important for analyzing user-generated web content and social media, which are featured in the form of extremely short posts and discussions. As topic sparsity of individual documents in online social media increases, so does the difficulty of analyzing the online text sources using traditional methods. In this paper, we propose two novel neural models by providing sparse posterior distributions over topics based on the Gaussian sparsemax construction, enabling efficient training by stochastic backpropagation. We construct an inference network conditioned on the input data and infer the variational distribution with the relaxed Wasserstein (RW) divergence. Unlike existing works based on Gaussian softmax construction and Kullback-Leibler (KL) divergence, our approaches can identify latent topic sparsity with training stability, predictive performance, and topic coherence. Experiments on different genres of large text corpora have demonstrated the effectiveness of our models as they outperform both probabilistic and neural methods.

[1]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[2]  C. Villani Optimal Transport: Old and New , 2008 .

[3]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[4]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[5]  Sebastian Nowozin,et al.  Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks , 2017, ICML.

[6]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[7]  Ted Pedersen,et al.  An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[8]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[9]  Bhiksha Raj,et al.  Sparse Overcomplete Latent Variable Decomposition of Counts Data , 2007, NIPS.

[10]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .

[11]  Geoffrey E. Hinton,et al.  Replicated Softmax: an Undirected Topic Model , 2009, NIPS.

[12]  M. Knott,et al.  On the optimal mapping of distributions , 1984 .

[13]  Charles A. Sutton,et al.  Autoencoding Variational Inference For Topic Models , 2017, ICLR.

[14]  Wai Lam,et al.  Latent Aspect Mining via Exploring Sparsity and Intrinsic Information , 2014, CIKM.

[15]  Brian D. Davison,et al.  Empirical study of topic modeling in Twitter , 2010, SOMA '10.

[16]  Phil Blunsom,et al.  Neural Variational Inference for Text Processing , 2015, ICML.

[17]  Eric P. Xing,et al.  Sparse Additive Generative Models of Text , 2011, ICML.

[18]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[19]  Jiafeng Guo,et al.  BTM: Topic Modeling over Short Texts , 2014, IEEE Transactions on Knowledge and Data Engineering.

[20]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[21]  Chong Wang,et al.  Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process , 2009, NIPS.

[22]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[23]  Phil Blunsom,et al.  Discovering Discrete Latent Topics with Neural Variational Inference , 2017, ICML.

[24]  Xu Chen,et al.  The contextual focused topic model , 2012, KDD.

[25]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[26]  Noah A. Smith,et al.  A Neural Framework for Generalized Topic Models , 2017, ArXiv.

[27]  Nan Yang,et al.  Relaxed Wasserstein with Applications to GANs , 2017, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Hugo Larochelle,et al.  A Neural Autoregressive Topic Model , 2012, NIPS.

[29]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[30]  Heng Ji,et al.  A Novel Neural Topic Model and Its Supervised Extension , 2015, AAAI.

[31]  Eric P. Xing,et al.  Sparse Topical Coding , 2011, UAI.

[32]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[33]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[34]  Bernhard Schölkopf,et al.  Wasserstein Auto-Encoders , 2017, ICLR.

[35]  Hong Cheng,et al.  Understanding Sparse Topical Structure of Short Text via Stochastic Variational-Gibbs Inference , 2016, CIKM.

[36]  Feng Yan,et al.  Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units , 2009, NIPS.

[37]  D. Dowson,et al.  The Fréchet distance between multivariate normal distributions , 1982 .

[38]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[39]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[40]  Yanchun Zhang,et al.  Neural Sparse Topical Coding , 2018, ACL.

[41]  Bo Zhang,et al.  Sparse online topic models , 2013, WWW.

[42]  Hong Cheng,et al.  The dual-sparse topic model: mining focused topics and focused terms in short text , 2014, WWW.

[43]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[44]  Ramón Fernández Astudillo,et al.  From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.