The Efficacy of GlOSS for the Text Database Discovery Problem

The popularity of information retrieval has led users to a new problem: finding which text databases (out of thousands of candidate choices) are the most relevant to a user. Answering a given query with a list of relevant databases is the text database discovery problem. The first part of this paper presents a practical method for attacking this problem based on estimating the result size of a query and a database. The method is termed GlOSS--Glossary of Servers Server. The second part of this paper evaluates GlOSS using four different semantics to answer a user''s queries. Real users'' queries were used in the experiments. We also describe several variations of GlOSS and compare their efficacy. In addition, we analyze the storage cost of our approach to the problem.