IR-related abstracts
暂无分享,去创建一个
Getting access to information stored in databases is an art which still requires experienced intermediaries between end-users of information and the databases. If the intermediary could be replaced by a system for automatic retrieval then the database market would be open to end-users. Automatic here means complete obviation not only of intermediaries but also of training. Such a system, a software package, called 'DiRer' for Direct Retrieval) has been developed and tested: the end-user enters a query in natural language; the system displays texts of documents; and the user indicates which are relevant to his]her query and which are not. The system requires an inverted as well as a linear file, makes use of Boolean search strategies, co-occurrence of terms in documents, and of relevance feedback for term weighting, document ranking and selection of successful subqueries. It is applicable to 'term databases', i.e., databases which consist of words (terms), such as keyword indexes or full texts. It can be used for in-house databases, for databases stored on CD-ROM, and it can be implemented on a mainframe computer for online access. It was applied to 249 abstracts of EXCERPTA MEDICA with 46,568 words. The retrieval results and earlier simulations of the system indicate that its performance (in terms of precision and recall), if compared with the performance of skilled intermediaries, is about equal for superficial and superior for exhaustive searches. For full text databases it renders intellectual indexing superfluous. This article introduces a new methodology for studying the effects of term conflation and postings overlap on retrieval, under the assumption that pooling results for several searchers or questions-an approach often used in conventional research in online searching-may obscure or eliminate important relationships. The proposed methodology involves comprehensive quantitative analyses, basses on the concept of elementary postings sets, to be performed on combinations of search terms and postings sets for a single online search. Definitions for Overlap(i,j) between two search terms and the global Overlap(F) for a search facet are proposed and evaluated. The methodology is tested for a particular case study, and is found to suggest insights not previously observed. Among the tentative findings are an understanding of why briefsearches cannot achieve high recall; that postings overlap data among terms in a facet can be used to flag and help one understand the semantic differences between single-maining and multi-meaning facets; that for single-meaning facets overlap among search terms in records is …