论文信息 - Semantic visualization for spherical representation

Semantic visualization for spherical representation

Visualization of high-dimensional data such as text documents is widely applicable. The traditional means is to find an appropriate embedding of the high-dimensional representation in a low-dimensional visualizable space. As topic modeling is a useful form of dimensionality reduction that preserves the semantics in documents, recent approaches aim for a visualization that is consistent with both the original word space, as well as the semantic topic space. In this paper, we address the semantic visualization problem. Given a corpus of documents, the objective is to simultaneously learn the topic distributions as well as the visualization coordinates of documents. We propose to develop a semantic visualization model that approximates L2-normalized data directly. The key is to associate each document with three representations: a coordinate in the visualization space, a multinomial distribution in the topic space, and a directional vector in a high-dimensional unit hypersphere in the word space. We join these representations in a unified generative model, and describe its parameter estimation through variational inference. Comprehensive experiments on real-life text datasets show that the proposed method outperforms the existing baselines on objective evaluation metrics for visualization quality and topic interpretability.

Hady Wirawan Lauw | Tuan M. V. Le | Hady W. Lauw

[1] Pierre Comon. Independent component analysis - a new concept? signal processing , 1994 .

[2] K. Mardia. Distribution Theory for the Von Mises-Fisher Distribution and Its Application , 1975 .

[3] William Ribarsky,et al. ParallelTopics: A probabilistic approach to exploring document collections , 2011, 2011 IEEE Conference on Visual Analytics Science and Technology (VAST).

[4] Thomas Hofmann,et al. Probabilistic latent semantic indexing , 1999, SIGIR '99.

[5] Gilbert L. Peterson,et al. Document Clustering and Visualization with Latent Dirichlet Allocation and Self-Organizing Maps , 2009, FLAIRS.

[6] J. Kruskal. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[7] Mikhail Belkin,et al. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[8] R. Fisher. THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[9] P. Comon. Independent Component Analysis , 1992 .

[10] S. R. Jammalamadaka,et al. Directional Statistics, I , 2011 .

[11] David M. Blei,et al. Visualizing Topic Models , 2012, ICWSM.

[12] David Newman,et al. External evaluation of topic models , 2009 .

[13] Timothy Baldwin,et al. Automatic Evaluation of Topic Coherence , 2010, NAACL.

[14] Nebojsa Jojic,et al. Documents as multiple overlapping windows into grids of counts , 2013, NIPS.

[15] Inderjit S. Dhillon,et al. Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[16] Christopher M. Bishop,et al. GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[17] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.

[18] J. Tenenbaum,et al. A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[19] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[20] Padhraic Smyth,et al. TopicNets: Visual Analysis of Large Text Corpora with Topic Modeling , 2012, TIST.

[21] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[22] Martin D. Buhmann,et al. Radial Basis Functions , 2021, Encyclopedia of Mathematical Geosciences.

[23] Geoffrey E. Hinton,et al. Stochastic Neighbor Embedding , 2002, NIPS.

[24] Hady Wirawan Lauw,et al. Manifold Learning for Jointly Modeling Topic and Visualization , 2014, AAAI.

[25] Thomas L. Griffiths,et al. Parametric Embedding for Class Visualization , 2004, Neural Computation.

[26] Ian T. Jolliffe,et al. Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[27] Bryan Silverthorn,et al. Spherical Topic Models , 2010, ICML.

[28] Ana Margarida de Jesus,et al. Improving Methods for Single-label Text Categorization , 2007 .

[29] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[30] Qiang Zhang,et al. TIARA: a visual exploratory text analytic system , 2010, KDD '10.

[31] Pierre Comon,et al. Independent component analysis, A new concept? , 1994, Signal Process..

[32] Naonori Ueda,et al. Probabilistic latent semantic visualization: topic model for visualizing documents , 2008, KDD.

[33] Jeffrey Heer,et al. Termite: visualization techniques for assessing textual topic models , 2012, AVI.

[34] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[35] Teuvo Kohonen,et al. The self-organizing map , 1990 .

[36] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.