Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies

As large numbers of text databases have become available on the Internet, it is getting harder to locate the right sources for given queries. In this paper we present gGlOSS, a generalized Glossary-Of-Servers Server, that keeps statistics on the available databases to estimate which databases are the potentially most useful for a given query. gGlOSS extends our previous work, which focused on databases using the boolean model of document retrieval, to cover databases using the more sophisticated vector-space retrieval model. We evaluate our new techniques using real-user queries and 53 databases. Finally, we further generalize our approach by showing how to build a hierarchy of gGlOSS brokers. The top level of the hierarchy is so small it could be widely replicated, even at end-user workstations.

[1]  Luis Gravano,et al.  Precision and recall of GlOSS estimators for database discovery , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[2]  Joann J. Ordille,et al.  Distributed active catalogs and meta-data caching in descriptive name services , 1993, [1993] Proceedings. The 13th International Conference on Distributed Computing Systems.

[3]  Hector Garcia-Molina,et al.  SIFT - a Tool for Wide-Area Information Dissemination , 1995, USENIX.

[4]  Brewster Kahle,et al.  An information system for corporate users: wide area information servers , 1991 .

[5]  Peter B. Danzig,et al.  Harvest: A Scalable, Customizable Discovery and Access System , 1994 .

[6]  Peter B. Danzig,et al.  Internet resource discovery services , 1993, Computer.

[7]  B. Clifford Neuman,et al.  The Prospero File System: A Global File System Based on the Virtual System Model , 1992, Comput. Syst..

[8]  Peter Schwarz,et al.  Data Structures for E cient Broker Implementation , 1996 .

[9]  Chris Clifton,et al.  Information Brokers: Sharing Knowledge in a Heterogeneous Distributed System , 1993, DEXA.

[10]  C. Mic Bowman,et al.  A File System for Information Management , 1994 .

[11]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[12]  Peter B. Danzig,et al.  Distributed indexing: a scalable mechanism for distributed information retrieval , 1991, SIGIR '91.

[13]  Mark A. Sheldon,et al.  A Content Routing System for Distributed Information Servers , 1993 .

[14]  Andrzej Duda,et al.  Content routing in a network of WAIS servers , 1994, 14th International Conference on Distributed Computing Systems.

[15]  Ellen M. Voorhees,et al.  The Collection Fusion Problem , 1994, TREC.

[16]  Michael F. Schwartz,et al.  A Scalable, Non-Hierarchical Resource Discovery Mechanism Based on Probabilistic Protocols† , 1990 .

[17]  Michael F. Schwartz,et al.  Internet resource discovery at the University of Colorado , 1993, Computer.

[18]  Jim Fullton,et al.  Architecture of the Whois++ Index Service , 1996, RFC.

[19]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[20]  Alistair Moffat,et al.  Information Retrieval Systems for Large Document Collections , 1994, TREC.

[21]  Luis Gravano,et al.  The Effectiveness of GlOSS for the Text Database Discovery Problem , 1994, SIGMOD Conference.

[22]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[23]  Anthony Tomasic,et al.  Data Structures for Eecient Broker Implementation , 2022 .

[24]  Peter B. Danzig,et al.  Distributed Indexing of Autonomous Internet Services , 1992, Comput. Syst..

[25]  Luis Gravano,et al.  Data structures for efficient broker implementation , 1997, TOIS.

[26]  Yelena Yesha,et al.  An Information Retrieval System for Network Resources , 1993, NGITS.

[27]  B. Clifford Neuman,et al.  A Comparison of Internet Resource Discovery Approaches , 1992, Comput. Syst..

[28]  Mark A. Sheldon,et al.  Content Routing for Distributed Information Servers , 1994, EDBT.