Relevance in Textual Retrieval

Text processing has stimulated great interest over the last several years, prompted by technical advances in storage, searching, telecommunications, and user interfaces. The increasing generation of text causes problems in terms of storage and retrieval, and there are no signs of this trend abating in the future. This technology includes electronic publishing, computer networks (e.g., electronic mail and bulletin boards), full-text systems, image databases, and hypermedia systems. Moreover, the determination of which rules should fire in an expert system implies the need for relevance determination there, too. One major aspect of text processing is information retrieval, the determination of which of a set of documents or records should be retrieved in response to a user query for information. However, in spite of a variety of theoretical advances, including models, front ends, natural language processing, and artificial intelligence methods, as well as relevance feedback, improved retrieval system performance has been an elusive target. Information retrieval has been the subject of much research over the last several years, largely due to the imprecise nature of determining which textual records are relevant to user queries. One can view an information retrieval system as a set of records that are identified, acquired, indexed, and stored and a set of user queries for information that are matched to the index to determine which subset of the stored records should be retrieved and presented to the user. The index can involve descriptive information (i.e., bibliographic information in the case of textual documents, such as author, title, publisher), and content indicators (such as keywords or subject headings) to indicate the nature of what the document is "about." The use of controlled vocabularies, and possibly thesauri, versus free or natural language is an important issue in indexing in generating these terms. Indexing can be seen as a mapping from the set of documents and the set of keywords (terms or phrases) into a set of values (often either {0,i} or [0,i]) indicating how much a given document is "about" the concept(s) represented by the keyword. Often, relative frequencies are used to measure this aboutness. A later development was to add weights to the terms in the user query (generally in the same interval as the index weights) in order to indicate importance.

[1]  Carol L. Barry User-defined relevance criteria: an exploratory study , 1994 .

[2]  Gloria Bordogna,et al.  Query term weights as constraints in fuzzy information retrieval , 1991, Inf. Process. Manag..

[3]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[4]  W. Bruce Croft Approaches to Intelligent Information Retrieval , 1987, Inf. Process. Manag..

[5]  Donald H. Kraft,et al.  Fuzzy Sets and Generalized Boolean Retrieval Systems , 1983, Int. J. Man Mach. Stud..

[6]  Donald H. Kraft,et al.  The use of genetic programming to build queries for information retrieval , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[7]  宮本 定明 Fuzzy sets in information retrieval and cluster analysis , 1990 .

[8]  S. P. Harter Psychological relevance and information science , 1992 .

[9]  Donald H. Kraft,et al.  Advances in Information Retrieval: Where Is That /#*&@¢ Record? , 1985, Adv. Comput..

[10]  D. Kraft,et al.  An extended fuzzy linguistic approach to generalize Boolean information retrieval , 1994 .

[11]  Sadaaki Miyamoto,et al.  Fuzzy Sets in Information Retrieval and Cluster Analysis , 1990, Theory and Decision Library.

[12]  T. Park The Nature of Relevance in Information Retrieval: An Empirical Study , 1993, The Library Quarterly.

[13]  Thomas J. Froehlich,et al.  Relevance reconsidered—towards an agenda for the 21st century: introduction to special topic issue on relevance research , 1994 .

[14]  Gerald Salton,et al.  Automatic text processing , 1988 .