Towards Visual Exploration of Topic Shifts

This paper presents two approaches to visually analyze the topic shift of a pool of documents over a given period of time. The first of the proposed methods is based on a multi-dimensional scaling algorithm, which places vectors representing terms occurring in certain years (period-frequency-vectors) in a spatial, two-dimensional space. This kind of visualization enables the detection of terms occurring in documents, published in particular years, or terms spread over different years. The second method uses a graph based approach. Publishing dates of documents, as well as their terms are represented by the vertices of a graph. Terms related to a specific publishing year are connected to the vertex of the year via an edge. By usage of activation spreading techniques, terms frequently occurring in documents published in particular years can be discovered visually. We tested both approaches with 2431 abstracts of papers published in the IEEE Transactions on SMC-A, SMC-B, and SMC-C in the years 1996 to 2006. Our experiments indicate that a number of interesting terms can be nicely separated in clumps according to individual years or periods of time. In addition, one can visualize the emergence of specific terms over certain periods of time and how these and other terms fade away again later.

[1]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[2]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[3]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[4]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[5]  B. K. Bala,et al.  A model to predict climate-change impact on fish catch in the world oceans , 2005, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[6]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[7]  Yehuda Lindell,et al.  Text Mining at the Term Level , 1998, PKDD.

[8]  Kui-Lam Kwok A neural network for probabilistic information retrieval , 1989, SIGIR '89.

[9]  Uzay Kaymak,et al.  Visualizing the Computational Intelligence Field , 2006 .

[10]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[11]  Richard K. Belew,et al.  Adaptive information retrieval: using a connectionist representation to retrieve and learn about documents , 1989, SIGIR '89.

[12]  H. Chen,et al.  An Algorithmic Approach to Concept Exploration in a Large Knowledge Network (Automatic Thesaurus Consultation): Symbolic Branch-and-Bound Search vs. Connectionist Hopfield Net Activation , 1995, J. Am. Soc. Inf. Sci..

[13]  Trevor F. Cox,et al.  Metric multidimensional scaling , 2000 .

[14]  Ludo Waltman,et al.  Vos: A New Method for Visualizing Similarities between Objects , 2006, GfKl.

[15]  Reginald Ferber,et al.  Information Retrieval - Suchmodelle und Data-Mining-Verfahren für Textsammlungen und das Web , 2003 .

[16]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[17]  Gerard Salton,et al.  On the use of spreading activation methods in automatic information , 1988, SIGIR '88.