VisIRR: A Visual Analytics System for Information Retrieval and Recommendation for Large-Scale Document Data

In this article, we present an interactive visual information retrieval and recommendation system, called VisIRR, for large-scale document discovery. VisIRR effectively combines the paradigms of (1) a passive pull through query processes for retrieval and (2) an active push that recommends items of potential interest to users based on their preferences. Equipped with an efficient dynamic query interface against a large-scale corpus, VisIRR organizes the retrieved documents into high-level topics and visualizes them in a 2D space, representing the relationships among the topics along with their keyword summary. In addition, based on interactive personalized preference feedback with regard to documents, VisIRR provides document recommendations from the entire corpus, which are beyond the retrieved sets. Such recommended documents are visualized in the same space as the retrieved documents, so that users can seamlessly analyze both existing and newly recommended ones. This article presents novel computational methods, which make these integrated representations and fast interactions possible for a large-scale document corpus. We illustrate how the system works by providing detailed usage scenarios. Additionally, we present preliminary user study results for evaluating the effectiveness of the system.

[1]  Jaegul Choo,et al.  iVisClassifier: An interactive visual analytics system for classification based on supervised dimension reduction , 2010, 2010 IEEE Symposium on Visual Analytics Science and Technology.

[2]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[3]  Chris North,et al.  An insight-based methodology for evaluating bioinformatics visualizations , 2005, IEEE Transactions on Visualization and Computer Graphics.

[4]  Peter Pirolli,et al.  Information Foraging , 2009, Encyclopedia of Database Systems.

[5]  Haesun Park,et al.  A Procrustes problem on the Stiefel manifold , 1999, Numerische Mathematik.

[6]  Haesun Park,et al.  Fast Nonnegative Matrix Factorization: An Active-Set-Like Method and Comparisons , 2011, SIAM J. Sci. Comput..

[7]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[8]  John T. Stasko,et al.  An interactive visual testbed system for dimension reduction and clustering of large-scale high-dimensional data , 2013, Electronic Imaging.

[9]  Haesun Park,et al.  Fast Linear Discriminant Analysis using QR Decomposition and Regularization , 2007 .

[10]  Naftali Tishby,et al.  Document clustering using word clusters via the information bottleneck method , 2000, SIGIR '00.

[11]  John T. Stasko,et al.  iVisClustering: An Interactive Visual Document Clustering via Topic Modeling , 2012, Comput. Graph. Forum.

[12]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Haesun Park,et al.  Generalizing discriminant analysis using the generalized singular value decomposition , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[15]  Haesun Park,et al.  Sparse Nonnegative Matrix Factorization for Clustering , 2008 .

[16]  Gary Marchionini,et al.  Finding facts vs. browsing knowledge in hypertext systems , 1988, Computer.

[17]  Min Xu,et al.  Representing documents through their readers , 2013, KDD.

[18]  Jaegul Choo,et al.  Two-stage framework for visualization of clustered high dimensional data , 2009, 2009 IEEE Symposium on Visual Analytics Science and Technology.

[19]  Peter Pirolli,et al.  Computational models of information scent-following in a very large browsable text collection , 1997, CHI.

[20]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[21]  Catherine Plaisant,et al.  The challenge of information visualization evaluation , 2004, AVI.

[22]  Chris North,et al.  Semantic interaction for visual text analytics , 2012, CHI.

[23]  Chong Wang,et al.  Collaborative topic modeling for recommending scientific articles , 2011, KDD.

[24]  Aniket Kittur,et al.  Apolo: making sense of large network data by combining rich user interaction and machine learning , 2011, CHI.

[25]  Fan Chung,et al.  The heat kernel as the pagerank of a graph , 2007, Proceedings of the National Academy of Sciences.

[26]  R. Cattell,et al.  The Procrustes Program: Producing direct rotation to test a hypothesized factor structure. , 2007 .

[27]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[28]  Jaegul Choo,et al.  UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization , 2013, IEEE Transactions on Visualization and Computer Graphics.

[29]  Carlos Guestrin,et al.  Beyond keyword search: discovering relevant scientific literature , 2011, KDD.

[30]  Ben Shneiderman,et al.  Rapid understanding of scientific paper collections: Integrating statistics, text analytics, and visualization , 2012, J. Assoc. Inf. Sci. Technol..

[31]  Kohji Fukunaga,et al.  Introduction to Statistical Pattern Recognition-Second Edition , 1990 .

[32]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[33]  Jean-Daniel Fekete,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS , 2022 .

[34]  Xiaohui Yan,et al.  A biterm topic model for short texts , 2013, WWW.

[35]  James J. Thomas,et al.  Visualizing the non-visual: spatial analysis and interaction with information from text documents , 1995, Proceedings of Visualization 1995 Conference.

[36]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[37]  James R. Lewis,et al.  IBM computer usability satisfaction questionnaires: Psychometric evaluation and instructions for use , 1995, Int. J. Hum. Comput. Interact..

[38]  Hyunsoo Kim,et al.  Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares , 2006 .

[39]  William W. Cohen,et al.  Recommendation : A Study in Combining Multiple Information Sources , 2007 .

[40]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.