The Effectiveness of GlOSS for the Text Database Discovery Problem

The popularity of on-line document databases has led to a new problem: finding which text databases (out of many candidate choices) are the most relevant to a user. Identifying the relevant databases for a given query is the text database discovery problem. The first part of this paper presents a practical solution based on estimating the result size of a query and a database. The method is termed GlOSS—Glossary of Servers Server. The second part of this paper evaluates the effectiveness of GlOSS based on a trace of real user queries. In addition, we analyze the storage cost of our approach.

[1]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[2]  Alice Y. Chamis Selection of online databases using switching vocabularies , 1988 .

[3]  Alice Y. Chamis Selection of online databases using switching vocabularies , 1988, J. Am. Soc. Inf. Sci..

[4]  Gerard Salton,et al.  Parallel text search methods , 1988, CACM.

[5]  Michael F. Schwartz,et al.  A Scalable, Non-Hierarchical Resource Discovery Mechanism Based on Probabilistic Protocols† , 1990 .

[6]  Peter B. Danzig,et al.  Distributed indexing: a scalable mechanism for distributed information retrieval , 1991, SIGIR '91.

[7]  Brewster Kahle,et al.  An information system for corporate users: wide area information servers , 1991 .

[8]  B. Clifford Neuman,et al.  A Comparison of Internet Resource Discovery Approaches , 1992, Comput. Syst..

[9]  B. Clifford Neuman,et al.  The Prospero File System: A Global File System Based on the Virtual System Model , 1992, Comput. Syst..

[10]  Peter B. Danzig,et al.  Distributed Indexing of Autonomous Internet Services , 1992, Comput. Syst..

[11]  Michael F. Schwartz,et al.  Internet resource discovery at the University of Colorado , 1993, Computer.

[12]  Peretz Shoval,et al.  Routing Queries in a Network of Databases Driven by a Meta Knowledge-Base , 1993, NGITS.

[13]  Chris Clifton,et al.  Information Brokers: Sharing Knowledge in a Heterogeneous Distributed System , 1993, DEXA.

[14]  Luis Gravano,et al.  The Efficacy of GlOSS for the Text Database Discovery Problem , 1993, SIGMOD 1993.

[15]  Peter B. Danzig,et al.  Internet resource discovery services , 1993, Computer.

[16]  Mark A. Sheldon,et al.  A Content Routing System for Distributed Information Servers , 1993 .

[17]  Joann J. Ordille,et al.  Distributed active catalogs and meta-data caching in descriptive name services , 1993, [1993] Proceedings. The 13th International Conference on Distributed Computing Systems.

[18]  Mark A. Sheldon,et al.  Content Routing for Distributed Information Servers , 1994, EDBT.

[19]  Jim Fullton,et al.  Architecture of the Whois++ Index Service , 1996, RFC.