Visualizing Documents based on Topic Models

1We propose a method based on a topic model for visualizing documents withthe latent topic structure. Our method assumes that both documents and topicshave latent coordinates in a two-dimensional Euclidean space, or visualizationspace, and visualizes documents by considering a generative process of docu-ments as a mapping from the visualization space into the space of documents.A visualization, i.e. latent coordinates of documents, can be obtained by flttingthe model to given documents using the EM algorithm. In the experiments,we demonstrate that the proposed model can locate related documents closertogether than conventional visualization methods.

[1]  James J. Thomas,et al.  Visualizing the non-visual: spatial analysis and interaction with information from text documents , 1995, Proceedings of Visualization 1995 Conference.

[2]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[3]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[4]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[5]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[6]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[7]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[8]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[9]  Jianyong Sun,et al.  On Class Visualisation for High Dimensional Data: Exploring Scientific Data Sets , 2006, Discovery Science.

[10]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[12]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[13]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[14]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Steffen Bickel,et al.  Unsupervised prediction of citation influences , 2007, ICML '07.

[16]  Dunja Mladenic,et al.  Visualization of Text Document Corpus , 2005, Informatica.

[17]  Thomas L. Griffiths,et al.  Parametric Embedding for Class Visualization , 2004, Neural Computation.

[18]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[19]  Ata Kabán,et al.  Predictive Modelling of Heterogeneous Sequence Collections by Topographic Ordering of Histories , 2007, Machine Learning.

[20]  Joseph L. Zinnes,et al.  Theory and Methods of Scaling. , 1958 .