Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques

The Internet provides an exceptional testbed for developing algorithms that can improve browsing and searching large information spaces. Browsing and searching tasks are susceptible to problems of information overload and vocabulary differences. Much of the current research is aimed at the development and refinement of algorithms to improve browsing and searching by addressing these problems. Our research was focused on discovering whether two of the algorithms our research group has developed, a Kohonen algorithm category map for browsing, and an automatically generated concept space algorithm for searching, can help improve browsing and/or searching the Internet. Our results indicate that a Kohonen self-organizing map (SOM)-based algorithm can successfully categorize a large and eclectic Internet information space (the Entertainment subcategory of Yahool) into manageable sub-spaces that users can successfully navigate to locate a homepage of interest to them. The SOM algorithm worked best with browsing tasks that were very broad, and in which subjects skipped around between categories. Subjects especially liked the visual and graphical aspects of the map. Subjects who tried to do a directed search, and those that wanted to use the more familiar mental models (alphabetic or hierarchical organization) for browsing, found that the map did not work well. The results from the concept space experiment were especially encouraging. There were no significant differences among the precision measures for the set of documents identified by subject-suggested terms, thesaurus-suggested terms, and the combination of subject- and thesaurus-suggested terms. The recall measures indicated that the combination of subject- and thesaurus-suggested terms exhibited significantly better recall than subject-suggested terms alone. Furthermore, analysis of the homepages indicated that there was limited overlap between the homepages retrieved by the subject-suggested and thesaurus-suggested terms. Since the retrieved homepages for the most part were different, this suggests that a user can enhance a keyword-based search by using an automatically generated concept space. Subjects especially liked the level of control that they could exert over the search, and the fact that the terms suggested by the thesaurus were real (i.e., originating in the homepages) and therefore guaranteed to have retrieval success.

[1]  Edward A. Fox,et al.  Building a Large Thesaurus for Information Retrieval , 1988, ANLP.

[2]  Gary Marchionini,et al.  A self-organizing semantic map for information retrieval , 1991, SIGIR '91.

[3]  Sara D. Knapp,et al.  Creating BRS/TERM, a Vocabulary Database for Searchers. , 1984 .

[4]  Alice Yanosko Chamis Vocabulary Control and Search Strategies in Online Searching , 1991 .

[5]  Carol A. Bean,et al.  Topical Relevance Relationships. II. An Exploratory Study and Preliminary Typology , 1995, J. Am. Soc. Inf. Sci..

[6]  Carolyn L. Foss,et al.  Tools for reading and browsing hypertext , 1989, Inf. Process. Manag..

[7]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[8]  GreenRebecca Topical relevance relationships. I , 1995 .

[9]  K. J. Lynch,et al.  Automatic construction of networks of concepts characterizing document databases , 1992, IEEE Trans. Syst. Man Cybern..

[10]  Alan F. Smeaton,et al.  The Retrieval Effects of Query Expansion on a Feedback Document Retrieval System , 1983, Comput. J..

[11]  Steve Nadis Computation Cracks 'Semantic Barriers' Between Databases , 1996 .

[12]  Hsinchun Chen,et al.  Reducing Indeterminism in Consultation: A Cognitive Model of User/Librarian Interactions , 1987, AAAI.

[13]  Garrison W. Cottrell,et al.  Representing documents using an explicit model of their similarities , 1995 .

[14]  Edward A. Fox,et al.  Development of the coder system: A testbed for artificial intelligence methods in information retrieval , 1987, Inf. Process. Manag..

[15]  Betsy L. Humphreys,et al.  The UMLS Knowledge Sources: Tools for Building Better User Interfaces. , 1990 .

[16]  B. Pinkerton,et al.  Finding What People Want : Experiences with the WebCrawler , 1994, WWW Spring 1994.

[17]  R. T. Niehoff,et al.  The role of automated subject switching in a distributed information network , 1979 .

[18]  Anne B. Piternick Searching vocabularies: a developing category of online search tools , 1984 .

[19]  Peter B. Danzig,et al.  Scalable Internet resource discovery: research problems and approaches , 1994, CACM.

[20]  Carolyn J. Crouch,et al.  An approach to the automatic construction of global thesauri , 1990, Inf. Process. Manag..

[21]  Hsinchun Chen,et al.  A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[23]  William H. Mischo,et al.  Federating Diverse Collections of Scientific Literature , 1996, Computer.

[24]  Gary Marchionini,et al.  Finding facts vs. browsing knowledge in hypertext systems , 1988, Computer.

[25]  Hsinchun Chen,et al.  Collaborative systems: solving the vocabulary problem , 1994, Computer.

[26]  Edie M. Rasmussen,et al.  Clustering Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[27]  Martha W. Evens,et al.  Relational thesauri in information retrieval , 1985, J. Am. Soc. Inf. Sci..

[28]  R. T. Niehoff Development of an Integrated Energy Vocabulary and the Possibilities for On-line Subject Switching , 1976, J. Am. Soc. Inf. Sci..

[29]  Marcia J. Bates,et al.  Subject access in online catalogs: A design model , 1986 .

[30]  Katherine W. McCain,et al.  Biotechnology in context: a database-filtering approach to identifying core and productive non-core journals supporting multidisciplinary R & D , 1995 .

[31]  Hsinchun Chen,et al.  Automatic Thesaurus Generation for an Electronic Community System , 1995, J. Am. Soc. Inf. Sci..

[32]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[33]  Toni Petersen Developing a New Thesaurus for Art and Architecture , 1990 .

[34]  Jay F. Nunamaker,et al.  A graphical, self-organizing approach to classifying electronic meeting output , 1997 .

[35]  Karen A. Frenkel,et al.  The human genome project and informatics , 1991, CACM.

[36]  Sudha Ram,et al.  HyperIntelligence: the next frontier , 1990, CACM.

[37]  Hsinchun Chen,et al.  Browsing in hypertext: a cognitive study , 1992, IEEE Trans. Syst. Man Cybern..

[38]  Margaret Chaplan Mapping "Laborline Thesaurus" Terms to Library of Congress Subject Headings: Implications for Vocabulary Switching , 1995, The Library Quarterly.

[39]  Hsinchun Chen,et al.  An algorithmic approach to concept exploration in a large knowledge network (automatic thesaurus consultation): symbolic branch-and-bound search vs. connectionist Hopfield net activation , 1995 .

[40]  Peter Willett,et al.  Effectiveness of query expansion in ranked-output document retrieval systems , 1992, J. Inf. Sci..

[41]  Martha W. Evens,et al.  Generating a Relational Lexicon from a Machine–Readable Dictionary* , 1988 .

[42]  B. Everitt,et al.  Cluster Analysis (2nd ed). , 1982 .

[43]  Hsinchun Chen,et al.  Building Large-Scale Digital Libraries - Guest Editors' Introduction , 1996, Computer.

[44]  Frederick Hayes-Roth,et al.  Building expert systems , 1983, Advanced book program.

[45]  Hsinchun Chen,et al.  User Misconceptions of Information Retrieval Systems , 1988, Int. J. Man Mach. Stud..

[46]  Gerald Salton,et al.  Automatic text processing , 1988 .

[47]  Rebecca Green,et al.  Topical Relevance Relationships. I. Why Topic Matching Fails , 1995, J. Am. Soc. Inf. Sci..

[48]  Reinier Post,et al.  Information Retrieval in the World-Wide Web: Making Client-Based Searching Feasible , 1994, Comput. Networks ISDN Syst..

[49]  M. E. Maron,et al.  An evaluation of retrieval effectiveness for a full-text document-retrieval system , 1985, CACM.

[50]  M. H. Heine An investigation of the relative influences of database informativeness, query size and query term specificity on the effectiveness of Medline searching , 1995, J. Inf. Sci..

[51]  Peter Willett,et al.  The Limitations of Term Co-Occurrence Data for Query Expansion in Document Retrieval Systems , 1991 .

[52]  Hsinchun Chen,et al.  Cognitive process as a basis for intelligent retrieval systems design , 1991, Inf. Process. Manag..

[53]  Susan T. Dumais,et al.  Statistical semantics: How can a computer use what people name things to guess what things people mean when they name things? , 1982, CHI '82.

[54]  Hsinchun Chen,et al.  Internet Categorization and Search: A Self-Organizing Approach , 1996, J. Vis. Commun. Image Represent..

[55]  Peter B. Danzig,et al.  The Harvest Information Discovery and Access System , 1995, Comput. Networks ISDN Syst..

[56]  L. M. Bellamy,et al.  Thesaurus development for subject cataloging , 1989 .