ProbMap - A probabilistic approach for mapping large document collections

The visualization of large text databases and document collections is an important step towards more flexible and interactive types of information access and retrieval. This paper presents a probabilistic approach which combines a statistical, model-based analysis of a given set of documents with a topological visualization principle. Our method can be utilized to derive topic maps, which represent topical information by characteristic keyword distributions arranged in a two-dimensional spatial layout. Combined with multi-resolution techniques this provides a three-dimensional space for interactive information navigation in large text collections.

[1]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[2]  K. Schulten,et al.  Kohonen's self-organizing maps: exploring their computational capabilities , 1988, IEEE 1988 International Conference on Neural Networks.

[3]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[4]  Thomas Hofmann,et al.  Topic-based language models using EM , 1999, EUROSPEECH.

[5]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[6]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[7]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[8]  Fernando Pereira,et al.  Aggregate and mixed-order Markov models for statistical language processing , 1997, EMNLP.

[9]  Joachim M. Buhmann,et al.  Competitive learning algorithms for robust vector quantization , 1998, IEEE Trans. Signal Process..

[10]  Joachim M. Buhmann,et al.  Complexity Optimized Data Clustering by Competitive Neural Networks , 1993, Neural Computation.

[11]  Michael I. Jordan,et al.  Unsupervised Learning from Dyadic Data , 1998 .

[12]  D. Geiger,et al.  Stratified exponential families: Graphical models and model selection , 2001 .

[13]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[14]  Thomas Hofmann,et al.  Learning from Dyadic Data , 1998, NIPS.

[15]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[16]  Timo Honkela,et al.  WEBSOM - Self-organizing maps of document collections , 1998, Neurocomputing.

[17]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[18]  S. P. Luttrell,et al.  Hierarchical vector quantisation , 1989 .

[19]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[20]  Joachim M. Buhmann,et al.  Stochastic Algorithms for Exploratory Data Analysis: Data Clustering and Data Visualization , 1998, Learning in Graphical Models.