LSAView: A tool for visual exploration of latent semantic modeling

Latent Semantic Analysis (LSA) is a commonly-used method for automated processing, modeling, and analysis of unstructured text data. One of the biggest challenges in using LSA is determining the appropriate model parameters to use for different data domains and types of analyses. Although automated methods have been developed to make rank and scaling parameter choices, these approaches often make choices with respect to noise in the data, without an understanding of how those choices impact analysis and problem solving. Further, no tools currently exist to explore the relationships between an LSA model and analysis methods. Our work focuses on how parameter choices impact analysis and problem solving. In this paper, we present LSAView, a system for interactively exploring parameter choices for LSA models. We illustrate the use of LSAView's small multiple views, linked matrix-graph views, and data views to analyze parameter selection and application in the context of graph layout and clustering.

[1]  Tamara G. Kolda,et al.  A semidiscrete matrix decomposition for latent semantic indexing information retrieval , 1998, TOIS.

[2]  Roger B. Bradford,et al.  An empirical study of required dimensionality for large-scale latent semantic indexing applications , 2008, CIKM '08.

[3]  Tamara G. Kolda,et al.  Higher-order Web link analysis using multilinear algebra , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[4]  Peter D. Hoff,et al.  Model Averaging and Dimension Selection for the Singular Value Decomposition , 2006, math/0609042.

[5]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[6]  Kenneth Moreland,et al.  Intelligence Analysis Using Titan , 2007, 2007 IEEE Symposium on Visual Analytics Science and Technology.

[7]  S. Katagiri,et al.  Discriminative Subspace Method for Minimum Error Pattern Recognition , 1995, Proceedings of 1995 IEEE Workshop on Neural Networks for Signal Processing.

[8]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[9]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[10]  Patrick O. Perry,et al.  Bi-cross-validation of the SVD and the nonnegative matrix factorization , 2009, 0908.2062.

[11]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[12]  S. T. Dumais,et al.  Using latent semantic analysis to improve access to textual information , 1988, CHI '88.

[13]  Stephen G. Eick,et al.  A Visualization Testbed for Analyzing the Performance of Computational Linguistics Algorithms† , 2007, Inf. Vis..

[14]  Shigeaki Watanabe,et al.  Subspace method to pattern recognition , 1973 .

[15]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[16]  Danielle S. McNamara,et al.  Handbook of latent semantic analysis , 2007 .

[17]  Dennis P. Groth,et al.  Visualizing Distributions and Classification Accuracy , 2006, Tenth International Conference on Information Visualisation (IV'06).

[18]  Evgeniy Gabrilovich,et al.  Text categorization with many redundant features: using aggressive feature selection to make SVMs competitive with C4.5 , 2004, ICML.

[19]  P. Fayers,et al.  The Visual Display of Quantitative Information , 1990 .

[20]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[21]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[22]  Kristina Lerman,et al.  Document Clustering in Reduced Dimension Vector Space , 1999 .

[23]  Darrell Laham,et al.  From paragraph to graph: Latent semantic analysis for information visualization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[24]  B. Marx The Visual Display of Quantitative Information , 1985 .

[25]  Paul Over,et al.  Intrinsic Evaluation of Generic News Text Summarization Systems , 2003 .

[26]  Farshad Fotouhi,et al.  Augmenting the power of LSI in text retrieval: Singular value rescaling , 2008, Data Knowl. Eng..

[27]  Debapriyo Majumdar,et al.  Why spectral retrieval works , 2005, SIGIR '05.

[28]  Michael W. Berry,et al.  Email Surveillance Using Non-negative Matrix Factorization , 2005, Comput. Math. Organ. Theory.

[29]  Brian Wylie,et al.  A unified toolkit for information and scientific visualization , 2009, Electronic Imaging.