Evaluating visual analytics for text information retrieval

Retrieving information from document collections is necessary in many contexts, for example, researchers wish to retrieve papers on a research topic, physicians search for patient records related to a certain condition, police investigators seek for relationships in criminal reports. Common to these scenarios are users in need of identifying relevant textual information in a document collection. The task is challenging, especially when users hope for a retrieval process that misses none or very few of the relevant documents. Visual Analytics (VA) approaches are often advocated to support document retrieval tasks. VA relies on integrating interactive visualizations and machine learning algorithms so that a domain expert can gradually steer a system into identifying the relevant documents. As an example, TRIVIR is a state-of-the-art system that allows exploring a corpus while providing feedback to a classifier that suggests potentially relevant documents to a reference query document. Assessing VA-supported Information Retrieval (IR) strategies is also challenging, as using these systems typically involves many conceptual and practical aspects and text retrieval tasks can demand considerable cognitive effort. In this paper, we present results from observational studies on VA-supported text information retrieval. We conducted sessions with graduate students and researchers using TRIVIR to explore scientific papers for purposes of literature review. A first study allowed us to collect opinions and identify some usability issues and practical limitations of the available implementation. After handling some critical issues observed at the interface level, we conducted a second round of sessions in order to collect further user opinions regarding a retrieval process assisted with VA. We concluded that most users have a very positive view of the system's usability and its ability to facilitate their retrieval tasks. Nonetheless, we also learnt that a proper introduction to the role of the interface elements is important and that conveying the underlying conceptual model and its limitations can be difficult. We observed considerable variation in user assessment of the specific functionalities and some users may face practical difficulties in using the system autonomously in an optimal way.

[1]  Tobias Isenberg,et al.  A Systematic Review on the Practice of Evaluating Visualization , 2013, IEEE Transactions on Visualization and Computer Graphics.

[2]  Shashwat Pathak,et al.  Data Visualization Techniques, Model and Taxonomy , 2020 .

[3]  Rosane Minghim,et al.  A Visual Approach for Interactive Keyterm-Based Clustering , 2018, ACM Trans. Interact. Intell. Syst..

[4]  Matthijs Douze,et al.  FastText.zip: Compressing text classification models , 2016, ArXiv.

[5]  Maria Cristina Ferreira de Oliveira,et al.  TRIVIR: A Visualization System to Support Document Retrieval with High Recall , 2019, DocEng.

[6]  Daniel A. Keim,et al.  Information Visualization and Visual Data Mining , 2002, IEEE Trans. Vis. Comput. Graph..

[7]  Sebastian Koch,et al.  Visual Analysis and Dissemination of Scientific Literature Collections with SurVis , 2016, IEEE Transactions on Visualization and Computer Graphics.

[8]  Wen Lou,et al.  PaperPoles: Facilitating adaptive visual exploration of scientific publications by citation links , 2019, J. Assoc. Inf. Sci. Technol..

[9]  Maria Cristina Ferreira de Oliveira,et al.  Seeing beyond reading: a survey on visual text analytics , 2012, WIREs Data Mining Knowl. Discov..

[10]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[11]  Ashraf Darwish,et al.  Intelligent Health Monitoring Systems for Space Missions Based on Data Mining Techniques , 2020 .

[12]  GORDON V. CORMACK,et al.  Continuous Active Learning for TAR , 2016 .

[13]  Lucy T. Nowell,et al.  ThemeRiver: visualizing theme changes over time , 2000, IEEE Symposium on Information Visualization 2000. INFOVIS 2000. Proceedings.

[14]  Haim Levkowitz,et al.  From Visual Data Exploration to Visual Data Mining: A Survey , 2003, IEEE Trans. Vis. Comput. Graph..

[15]  Rosane Minghim,et al.  Interactive Document Clustering Revisited: A Visual Analytics Approach , 2018, IUI.

[16]  Tamara Munzner,et al.  A Nested Model for Visualization Design and Validation , 2009, IEEE Transactions on Visualization and Computer Graphics.

[17]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[18]  Yain-Whar Si,et al.  Force-directed algorithms for schematic drawings and placement: A survey , 2019, Inf. Vis..

[19]  Quanming Yao,et al.  VISTopic: A visual analytics system for making sense of large document collections using hierarchical topic modeling , 2017, Vis. Informatics.

[20]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[21]  Haim Levkowitz,et al.  Least Square Projection: A Fast High-Precision Multidimensional Projection Technique and Its Application to Document Mapping , 2008, IEEE Transactions on Visualization and Computer Graphics.

[22]  Komal Kumar Bhatia,et al.  Prevalence of Visualization Techniques in Data Mining , 2020 .

[23]  Daniel A. Keim,et al.  Visual Analytics: Scope and Challenges , 2008, Visual Data Mining.

[24]  M. Sheelagh T. Carpendale,et al.  Empirical Studies in Information Visualization: Seven Scenarios , 2012, IEEE Transactions on Visualization and Computer Graphics.

[25]  Rosane Minghim,et al.  A Visual Analytics Approach for Interactive Document Clustering , 2019, ACM Trans. Interact. Intell. Syst..

[26]  Daniel A. Keim,et al.  Visual Analytics: Definition, Process, and Challenges , 2008, Information Visualization.

[27]  Owen Kaser,et al.  Tag-Cloud Drawing: Algorithms for Cloud Visualization , 2007, ArXiv.

[28]  R. Zadeh Interactive Clustering , 2009 .