Typed and structured systems for wide-area information management

This dissertation presents a model for typed and structured information content. The primary goal of the research is to use typed and structured information content in order to improve wide-area information management. Specifically, this dissertation addresses the issues of content representation, access accuracy and information manipulation, towards increasing the effectiveness of wide-area resource discovery and the usability of wide-area content. The research described in this dissertation is based on the hypothesis that well-structured information improves resource discovery in two ways. First, it increases the quality of resource discovery by making it possible to organize, classify and discover wide-area information. Better resource discovery combined with precise search and retrieval increases the accuracy of access. Second, it improves utilization of accessed information by enabling intelligent caching. Consequently, well-structured information increases the usability of wide area information. To verify the hypothesis, the dissertation defines a typed, structured representation for distributed information and demonstrates usability improvements through system construction, precision/recall experiments for structured information, and performance simulations for wide-area accesses. Experiments with publicly available testbeds are presented in order to show improvements using structured documents. Standard test collections are used to create structured documents. The experiments use these documents in computing precision and recall measures in order to evaluate the quality of information retrieval. In order to demonstrate the functionality of typed, structured documents, we present the construction of prototypes of wide area resource discovery systems using structured information. Each prototype demonstrates the usability of typed structured documents in several different ways. In particular, we constructed a distributed white-pages service, a prototype file system and an agent-based system for automated browsing. Finally, we present experimental research with type aware performance improvements, using actual user logs of wide area accesses. Specifically we discuss a tool built and deployed to collect wide-area access traces along with simulation and analysis which evaluates the effects of type aware caches on wide-area accesses. The dissertation ends with a discussion of possibilities of future research directions that stem from these ideas.