Automated generation of category-specific thesauri for interactivequery expansion

The categorisation of documents into subject-specific categories is a useful enhancement for large document collections addressed by information retrieval systems, as a user can first browse a category tree in search of the category that best matches her interests, and then issue a query for more specific documents "from within the category". This approach combines two modalities in information seeking that are most popular in Web-based search engines, i.e. category-based site browsing (as exemplified by e.g. YAHOO) and keyword-based document querying (as exemplified by e.g. ALTAVISTA). Appropriate query expansion tools need to be provided, though, in order to allow the user to incrementally refine her query through further retrieval passes, thus allowing the system to produce a series of subsequent document rankings that hopefully converge to the user''s expected ranking. In this work we propose that automatically generated, category-specific "associative" thesauri be used for such purpose. We discuss a method for their generation, and discuss how the thesaurus specific to a given category may usefully be endowed with "gateways" to the thesauri specific to its parent and children categories. TEL:: +39.050.593407 EMAIL:: fabrizio@iei.pi.cnr.it

[1]  Karen Sparck Jones Automatic keyword classification for information retrieval , 1971 .

[2]  Iain Campbell,et al.  The ostensive model of developing information needs , 2000 .

[3]  Peter Schäuble,et al.  The Various Roles of Information Structures , 1993 .

[4]  J Allan,et al.  Readings in information retrieval. , 1998 .

[5]  Martin Braschler,et al.  Cross-Language Information Retrieval in a Multilingual Legal Domain , 1997, ECDL.

[6]  Stephen E. Robertson,et al.  Interactive Thesaurus Navigation: Intelligence Rules OK? , 1995, J. Am. Soc. Inf. Sci..

[7]  Fredric C. Gey,et al.  The relationship between recall and precision , 1994 .

[8]  Peter Willett,et al.  The limitations of term co-occurrence data for query expansion in document retrieval systems , 1991, J. Am. Soc. Inf. Sci..

[9]  Hsinchun Chen,et al.  An algorithmic approach to concept exploration in a large knowledge network (automatic thesaurus consultation): symbolic branch-and-bound search vs. connectionist Hopfield net activation , 1995 .

[10]  Harold Borko,et al.  Encyclopedia of library and information science , 1970 .

[11]  Mark Magennis,et al.  The potential and actual effectiveness of interactive query expansion , 1997, SIGIR '97.

[12]  Giuseppe Attardi,et al.  Theseus: Categorization by Context , 2000 .

[13]  Hans-Peter Frei,et al.  Concept based query expansion , 1993, SIGIR.

[14]  H. Chen,et al.  An Algorithmic Approach to Concept Exploration in a Large Knowledge Network (Automatic Thesaurus Consultation): Symbolic Branch-and-Bound Search vs. Connectionist Hopfield Net Activation , 1995, J. Am. Soc. Inf. Sci..

[15]  Gerard Salton,et al.  Experiments in Automatic Thesaurus Construction for Information Retrieval , 1971, IFIP Congress.

[16]  Hsinchun Chen,et al.  A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Gregory Grefenstette,et al.  Use of syntactic context to produce term association lists for text retrieval , 1992, SIGIR '92.

[18]  Gerard Salton,et al.  Automatic term class construction using relevance--A summary of work in automatic pseudoclassification , 1980, Inf. Process. Manag..

[19]  JonesSusan,et al.  Interactive thesaurus navigation , 1995 .

[20]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[21]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[22]  S. Haack Philosophy of logics , 1978 .

[23]  Hsinchun Chen,et al.  Interactive term suggestion for users of digital libraries: using subject thesauri and co-occurrence lists for information retrieval , 1996, DL '96.

[24]  Mark Sanderson,et al.  Word sense disambiguation and information retrieval , 1994, SIGIR '94.