Classifying complex topics using spatial-semantic document visualization : an evaluation of an interaction model to support open-ended search tasks

In this dissertation we propose, test and develop a novel search interaction model to address two key problems associated with conducting an open-ended search task within a classical information retrieval system: (i) the need to reformulate the query within the context of a shifting conception of the problem and (ii) the need to integrate relevant results across a number of separate results sets. In our model the user issues just one highrecall query and then performs a sequence of more focused, distinct aspect searches by browsing the static structured context of a spatial-semantic visualization of this retrieved document set. Our thesis is that unsupervised spatial-semantic visualization can automatically classify retrieved documents into a two-level hierarchy of relevance. In particular we hypothesise that the locality of any given aspect exemplar will tend to comprise a sufficient proportion of same-aspect documents to support a visually guided strategy for focused, same-aspect searching that we term the aspect cluster growing strategy. We examine spatial-semantic classification and potential aspect cluster growing performance across three scenarios derived from topics and relevance judgements from the TREC test collection. Our analyses show that the expected classification can be represented in spatial-semantic structures created from document similarities computed by a simple vector space text analysis procedure. We compare two diametrically opposed approaches to layout optimisation: a global approach that focuses on preserving the all similarities and a local approach that focuses only on the strongest similarities. We find that the local approach, based on a minimum spanning tree of similarities, produces a better classification and, as observed from strategy simulation, more efficient aspect cluster growing performance in most situations, compared to the global approach of multidimensional scaling. We show that a small but significant proportion of aspect clustering growing cases can be problematic, regardless of the layout algorithm used. We identify the characteristics of these cases and, on this basis, demonstrate a set of novel interactive tools that provide additional semantic cues to aid the user in locating same-aspect documents.

[1]  M Damashek,et al.  Gauging Similarity with n-Grams: Language-Independent Categorization of Text , 1995, Science.

[2]  Ophir Frieder,et al.  Improving relevance feedback in the vector space model , 1997, CIKM '97.

[3]  Ethan L. Miller,et al.  The TELLTALE Dynamic Hypertext Environment: Approaches to Scalability , 1997, Intelligent Hypertext.

[4]  Nicholas J. Belkin,et al.  Relevance Feedback versus Local Context Analysis as Term Suggestion Devices: Rutgers' TREC-8 Interactive Track Experience , 1999, TREC.

[5]  Ross Wilkinson,et al.  Using clustering and classification approaches in interactive retrieval , 2001, Inf. Process. Manag..

[6]  Robin Jeffries,et al.  Orienteering in an information landscape: how information seekers get from here to there , 1993, INTERCHI.

[7]  Mark E. Rorvig,et al.  Visualization and Scaling of TREC Topic Document Sets , 1998, Inf. Process. Manag..

[8]  Timothy Cribbin,et al.  Cognitive ability and information retrieval: When less is more , 2000, Virtual Reality.

[9]  Hsinchun Chen,et al.  Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques , 1998, J. Am. Soc. Inf. Sci..

[10]  Timothy Cribbin,et al.  Mapping semantic information in virtual space: dimensions, variance and individual differences , 2000, Int. J. Hum. Comput. Stud..

[11]  Timothy Cribbin,et al.  A study of navigation strategies in spatial-semantic visualizations , 2001 .

[12]  Paul Over,et al.  TREC-7 Interactive Track Report , 1998, TREC.

[13]  Daniel R. Montello,et al.  Testing the First Law of Cognitive Geography on Point-Display Spatializations , 2003, COSIT.

[14]  Chaomei Chen,et al.  Empirical studies of information visualization: a meta-analysis , 2000, Int. J. Hum. Comput. Stud..

[15]  David J. Harper,et al.  Topic modeling for mediated access to very large document collections , 2004, J. Assoc. Inf. Sci. Technol..

[16]  Stephen P. Harter,et al.  Online Information Retrieval: Concepts, Principles and Techniques , 1986 .

[17]  Gene Golovchinsky,et al.  What the query told the link: the integration of hypertext and information retrieval , 1997, HYPERTEXT '97.

[18]  Sara Irina Fabrikant,et al.  Spatialization Methods: A Cartographic Research Agenda for Non-geographic Information Visualization , 2003 .

[19]  Gheorghe Muresan,et al.  Using document clustering and language modelling in mediated information retrieval , 2002 .

[20]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[21]  Robert R. Korfhage,et al.  Visualization of a Document Collection: The VIBE System , 1993, Inf. Process. Manag..

[22]  Ben Shneiderman,et al.  Readings in information visualization - using vision to think , 1999 .

[23]  Donna K. Harman,et al.  Overview of the Sixth Text REtrieval Conference (TREC-6) , 1997, Inf. Process. Manag..

[24]  Christopher Williamson,et al.  Dynamic queries for information exploration: an implementation and evaluation , 1992, CHI.

[25]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[26]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[27]  E. Aronson,et al.  Theory and method , 1985 .

[28]  Marie Sjölinder,et al.  Individual Differences and Navigation in Hypermedia , 1996 .

[29]  Paul Over,et al.  TREC-6 Interactive Report , 1997, TREC.

[30]  Robert R. Korfhage Some thoughts on similarity measures , 1995, SIGF.

[31]  André Skupin,et al.  From metaphor to method: cartographic perspectives on information visualization , 2000, IEEE Symposium on Information Visualization 2000. INFOVIS 2000. Proceedings.

[32]  Satoru Kawai,et al.  An Algorithm for Drawing General Undirected Graphs , 1989, Inf. Process. Lett..

[33]  Kate Ehrlich,et al.  Information retrieval using a hypertext-based help system , 1989, TOIS.

[34]  Peter Eades,et al.  A Heuristic for Graph Drawing , 1984 .

[35]  George Karypis,et al.  Fast supervised dimensionality reduction algorithm with applications to document categorization & retrieval , 2000, CIKM '00.

[36]  Aviezri S. Fraenkel,et al.  Local Feedback in Full-Text Retrieval Systems , 1977, JACM.

[37]  Gary Marchionini,et al.  Information Seeking in Electronic Environments , 1995 .

[38]  Timothy Cribbin,et al.  Visualising and animating visual information foraging in context , 2001 .

[39]  Ellen M. Vdorhees The cluster hypothesis revisited , 1985, SIGIR 1985.

[40]  Timothy Cribbin,et al.  Visual-spatial exploration of thematic spaces: a comparative study of three visualization models , 2001, IS&T/SPIE Electronic Imaging.

[41]  Sara Irina Fabrikant,et al.  Spatialized Browsing in Large Data Archives , 2000, Trans. GIS.

[42]  Gerard Salton,et al.  The smart document retrieval project , 1991, SIGIR '91.

[43]  Chaomei Chen,et al.  Visualizing evolving networks: minimum spanning trees versus pathfinder networks , 2003, IEEE Symposium on Information Visualization 2003 (IEEE Cat. No.03TH8714).

[44]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[45]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[46]  Timothy Cribbin,et al.  Exploring Cognitive Issues in Visual Information Retrieval , 2001, INTERACT.

[47]  Mark T. Keane,et al.  Cognitive Psychology: A Student's Handbook , 1990 .

[48]  Chaomei Chen,et al.  Information Visualisation and Virtual Environments , 1999 .

[49]  Micheline Hancock-Beaulieu,et al.  An Evaluation of Interactive Query Expansion in an Online Library Catalogue with a Graphical User Interface , 1995, J. Documentation.

[50]  R. Prim Shortest connection networks and some generalizations , 1957 .

[51]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[52]  Alex A. Freitas,et al.  Document Clustering and Text Summarization , 2000 .

[53]  Hong Iris Xie Shifts of interactive intentions and information-seeking strategies in interactive information retrieval , 2000 .

[54]  Terrence A. Brooks,et al.  The Semantic Distance Model of Relevance Assessment. , 1998 .

[55]  Francis T. Durso,et al.  Network Structures in Proximity Data , 1989 .

[56]  Timothy Ostler,et al.  Information highlighting , 1999, 1999 IEEE International Conference on Information Visualization (Cat. No. PR00210).

[57]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[58]  Hsinchun Chen,et al.  Information navigation on the web by clustering and summarizing query results , 2001, Inf. Process. Manag..

[59]  Forrest W. Young,et al.  Nonmetric individual differences multidimensional scaling: An alternating least squares method with optimal scaling features , 1977 .

[60]  Chaomei Chen,et al.  Visualising Semantic Spaces and Author Co-Citation Networks in Digital Libraries , 1999, Inf. Process. Manag..

[61]  Marcia J. Bates,et al.  The design of browsing and berrypicking techniques for the online search interface , 1989 .

[62]  Donna K. Harman,et al.  Overview of the Fifth Text REtrieval Conference (TREC-5) , 1996, TREC.

[63]  Wolfgang Kienreich,et al.  The InfoSky visual explorer: Exploiting Hierarchical Structure and Document Similarities , 2002, Inf. Vis..

[64]  Hong Xie,et al.  Patterns between interactive intentions and information-seeking strategies , 2002, Inf. Process. Manag..

[65]  André Skupin,et al.  A cartographic approach to visualizing conference abstracts , 2002 .

[66]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[67]  Nicholas J. Belkin,et al.  Interaction with Texts: Information Retrieval as Information-Seeking Behavior , 1993, Information Retrieval.

[68]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[69]  Timothy Cribbin,et al.  Visualizing and tracking the growth of competing paradigms: Two case studies , 2002, J. Assoc. Inf. Sci. Technol..

[70]  Timothy Cribbin,et al.  Browsing a document collection represented in two- and three-dimensional virtual information space , 2005, Int. J. Hum. Comput. Stud..

[71]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[72]  James Allan,et al.  Aspect windows, 3-D visualizations, and indirect comparisons of information retrieval systems , 1998, SIGIR '98.

[73]  Gregory B. Newby,et al.  An Information Access Model with a Unified Approach to Data Type, Retrieval Mechanism and Information Need. , 1998 .

[74]  Richard Brath,et al.  Paper landscapes: a visualization design methodology , 2003, IS&T/SPIE Electronic Imaging.

[75]  Nicholas J. Belkin,et al.  A case for interaction: a study of interactive information retrieval behavior and effectiveness , 1996, CHI.

[76]  Gary Marchionini,et al.  A self-organizing semantic map for information retrieval , 1991, SIGIR '91.

[77]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[78]  Matthew Chalmers,et al.  Using a Landscape Methaphor to Represent a Corpus of Documents , 1993, COSIT.

[79]  K. Sparck Jones,et al.  A TEST FOR THE SEPARATION OF RELEVANT AND NON‐RELEVANT DOCUMENTS IN EXPERIMENTAL RETRIEVAL COLLECTIONS , 1973 .

[80]  Karen Markey,et al.  ONTAP: Online Training and Practice Manual for ERIC Data Base Searchers. , 1978 .

[81]  Timothy Cribbin,et al.  Footprints of information foragers: behaviour semantics of visual exploration , 2002, Int. J. Hum. Comput. Stud..

[82]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[83]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[84]  James J. Thomas,et al.  Visualizing the non-visual: spatial analysis and interaction with information from text documents , 1995, Proceedings of Visualization 1995 Conference.

[85]  C. J. van Rijsbergen,et al.  Query-sensitive similarity measures for the calculation of interdocument relationships , 2001, CIKM '01.

[86]  James A. Wise The ecological approach to text visualization , 1999 .

[87]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. II , 1962 .

[88]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[89]  Ben Shneiderman,et al.  Tree-maps: a space-filling approach to the visualization of hierarchical information structures , 1991, Proceeding Visualization '91.

[90]  Ray J. Paul,et al.  Visualizing a Knowledge Domain's Intellectual Structure , 2001, Computer.

[91]  Matthew Chalmers,et al.  Bead: explorations in information visualization , 1992, SIGIR '92.

[92]  James Allan,et al.  Interactive information organization: techniques and evaluation , 2001 .

[93]  James Allan,et al.  Evaluating combinations of ranked lists and visualizations of inter-document similarity , 2001, Inf. Process. Manag..

[94]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[95]  Xia Lin,et al.  Map Displays for Information Retrieval , 1997, J. Am. Soc. Inf. Sci..

[96]  Timothy Cribbin,et al.  Footprints of information foragers: behaviour semantics of visual exploration , 2002 .

[97]  O. Reiser,et al.  Principles Of Gestalt Psychology , 1936 .

[98]  IJsbrand Jan Aalbersberg,et al.  Incremental relevance feedback , 1992, SIGIR '92.

[99]  G Salvendy,et al.  Information visualization; assisting low spatial individuals with information access tasks through the use of visual mediators. , 1995, Ergonomics.

[100]  Ellen M. Voorhees,et al.  Overview of the Seventh Text REtrieval Conference , 1998 .

[101]  Kim J. Vicente,et al.  Accommodating Individual Differences in Searching a Hierarchical File System , 1988, Int. J. Man Mach. Stud..