Handling document expressions for capturing conceptual query–document relations

If keywords or key phrases that characterize a query or a document are replaced and expressed by identifiers within a conceptual hierarchy such as a thesaurus, excellent searches can be expected by taking conceptual query-document relations of the identifiers into consideration. If, however, judgments to check for matches in the extracted documents take time, the efficiency of the entire system will decrease. In order to deal with this problem, a method is required to easily understand what type of relations exist between the queries and document as well as what the document is referring to. This paper will initially discuss how to handle document expressions that can clearly point out the manner in which the conceptual relations between the queries and documents generate while contributing to improvements in the retrieval efficiency. A method will then be shown that simplifies document expressions when there are many identifiers within a document expression. The effectiveness of this proposed method was ascertained through experiments using test collections for retrievals. © 2007 Wiley Periodicals, Inc. Syst Comp Jpn, 38(9): 97–106, 2007; Published online in Wiley InterScience (). DOI 10.1002sscj.10349

[1]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[2]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[3]  Mark Sanderson,et al.  Word sense disambiguation and information retrieval , 1994, SIGIR '94.

[4]  Hsinchun Chen,et al.  Automatic Thesaurus Generation for an Electronic Community System , 1995, J. Am. Soc. Inf. Sci..

[5]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[6]  Hitoshi Iida,et al.  Document Retrieval Method Using Semantic Similarity and Word Sense Disambiguation , 1997 .

[7]  Janet L. Kolodner,et al.  Case-Based Reasoning , 1989, IJCAI 1989.

[8]  Michael Sussna,et al.  Word sense disambiguation for free-text indexing using a massive semantic network , 1993, CIKM '93.

[9]  Julio Gonzalo,et al.  Indexing with WordNet synsets can improve text retrieval , 1998, WordNet@ACL/COLING.

[10]  Hae-Chang Rim,et al.  Latent semantic indexing model for Boolean query formulation (poster session) , 2000, SIGIR '00.

[11]  Edward A. Fox,et al.  Coefficients of combining concept classes in a collection , 1988, SIGIR '88.

[12]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[13]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[14]  Carl Gutwin,et al.  Improving browsing in digital libraries with keyphrase indexes , 1999, Decis. Support Syst..

[15]  Edward A. Fox,et al.  Visualizing search results: some alternatives to query-document similarity , 1996, SIGIR '96.

[16]  Takenobu Tokunaga,et al.  Combining multiple evidence from different types of thesaurus for query expansion , 1999, SIGIR '99.

[17]  Gerald Salton,et al.  Automatic text processing , 1988 .

[18]  Makoto Nakashima,et al.  Browsing in a digital library collecting linearly arranged documents , 2001, SIGIR '01.

[19]  Hsinchun Chen,et al.  A concept space approach to addressing the vocabulary problem in scientific information retrieval: an experiment on the worm community system , 1997 .

[20]  Marti A. Hearst TileBars: visualization of term distribution information in full text information access , 1995, CHI '95.