Resource-limited information retrieval in Web-based environments

Commercial usage of the Internet increases. It is therefore likely that information providers will increasingly charge end-users for the information they provide on such networks. Hence, the need will also arise for information retrieval procedures that takes such costs into account. However, currently little support exists for such cost-effective information retrieval in a networked environment with multiple information providers. In this paper we present a framework for resource-limited information retrieval that enables a user to search for relevant information given time and cost constraints, e.g. dealing with information needs like retrieve the five most relevant images containing white monkeys as fast as possible, but within 1 minute, for less than $4,-We focus in our work on the 'retrieval strategy'. In our terminology, this strategy is a functional entity that is responsible for handling the query, that makes decisions on which information providers are queried, and which objects (if any) are being retrieved. In this paper we present such a strategy for resource-limited information retrieval (see also Velthausz, Eertink, Verhoosel, & Schot 1997). We assume that the objects are modelled using the ADMIRE information model (Velthausz, Bal & Eertink, 1996). This model facilitates aggregation and propagation of information that characterises reachable information objects. The composite relationships in the object hierarchy enable a bottom-up propagation and aggregation of the lower layered object characterisations. This information can subsequently be used to estimate the relevance of unexplored information objects. This use of summarised information to describe particular aspects of the lower layered nodes in a hierarchy, has also been reported in (Garcia-Molina, Gravano & Shivakumr 1996) for the content-characterisation of (hierarchical) databases containing textual documents. We have adapted some of their ideas for our prototype for web-based information. In the prototype environment, we assume that the information provider provides the characterisation of information (either automatically from text files, or (currently) by hand for multimedia information). The retrieval algorithm that is currently being exploited in our prototype is the well-known vector based keyword text-retrieval algorithm, with our own adaptations for time and cost constraints. These adaptations are based on Russel and Wefald's metareasoning decision theory (Russell & Wefald, 1991). In this paper, we first explain the context of our work. Subsequently, we explain how our strategy in principle works. Then, we show how the strategy can be applied on an ADMIRE-based information model that is generated from a WWW-site.