Database Selection for Longer Queries

Given the enormous amount of information now available on the Web, search engines have become the indispensable tools for people to find desired information from the Web. These search engines can be classified into two broad categories by the extent of their coverage. In the first category, we have the general-purpose search engines, such as Google and Yahoo, which have been attempting to index the whole Web and provide a search capability for all Web documents. Unfortunately, these centralized search engines suffer from several serious limitations such as the poor scalability and the difficulty in maintaining the freshness of their contents (Hawking and Thistlewaite, 1999).

[1]  King-Lup Liu,et al.  Detection of heterogeneities in a multiple text database environment , 1999, Proceedings Fourth IFCIS International Conference on Cooperative Information Systems. CoopIS 99 (Cat. No.PR00384).

[2]  C. Lee Giles,et al.  Accessibility of information on the web , 1999, Nature.

[3]  David Hawking,et al.  Methods for information server selection , 1999, TOIS.

[4]  King-Lup Liu,et al.  Efficient and effective metasearch for text databases incorporating linkages among documents , 2001, SIGMOD '01.

[5]  Steve Kirsch Infoseek's experiences searching the internet , 1998, SIGF.

[6]  EtzioniOren,et al.  Query routing for Web search engines , 2000 .

[7]  Amanda Spink,et al.  Real life information retrieval: a study of user queries on the Web , 1998, SIGF.

[8]  Donna K. Harman,et al.  Overview of the Sixth Text REtrieval Conference (TREC-6) , 1997, Inf. Process. Manag..

[9]  King-Lup Liu,et al.  Efficient and effective metasearch for a large number of text databases , 1999, CIKM '99.

[10]  Claire Cardie,et al.  Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification , 1998, ACL.

[11]  Adele E. Howe,et al.  Experiences with selecting search engines using metasearch , 1997, TOIS.

[12]  Kui-Lam Kwok,et al.  Improving two-stage ad-hoc retrieval for short queries , 1998, SIGIR '98.

[13]  Clement T. Yu,et al.  A highly scalable and effective method for metasearch , 2001, TOIS.

[14]  King-Lup Liu,et al.  Building efficient and effective metasearch engines , 2002, CSUR.

[15]  W. Meng,et al.  A Methodology for Retrieving Text Documents from Multiple Databases. (submitted for Publication.) Automatic Retrieval with Locality Information Using 6.3.1 Document Fetching 6.2 Similarity Adjustment 5.4 Learning-based Approaches 5.1 Local Determination 5.2 User Determination 5 Select Documents from , 2007 .

[16]  Stephen E. Robertson,et al.  Okapi at TREC-7: Automatic Ad Hoc, Filtering, VLC and Interactive , 1998, TREC.

[17]  Luis Gravano,et al.  Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies , 1995, VLDB.

[18]  Clement T. Yu,et al.  Towards a highly-scalable and effective metasearch engine , 2001, WWW '01.

[19]  W. Bruce Croft,et al.  Cluster-based language models for distributed retrieval , 1999, SIGIR '99.

[20]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[21]  King-Lup Liu,et al.  Finding the most similar documents across multiple text databases , 1999, Proceedings IEEE Forum on Research and Technology Advances in Digital Libraries.

[22]  Ellen M. Voorhees,et al.  The Collection Fusion Problem , 1994, TREC.

[23]  King-Lup Liu,et al.  A Methodology to Retrieve Text Documents from Multiple Databases , 2002, IEEE Trans. Knowl. Data Eng..

[24]  King-Lup Liu,et al.  Estimating the usefulness of search engines , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[25]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[26]  King-Lup Liu,et al.  A Statistical Method for Estimating the Usefulness of Text Databases , 2002, IEEE Trans. Knowl. Data Eng..

[27]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[28]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[29]  King-Lup Liu,et al.  Determining Text Databases to Search in the Internet , 1998, VLDB.

[30]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[31]  James P. Callan,et al.  Effective retrieval with distributed collections , 1998, SIGIR '98.

[32]  Oren Etzioni,et al.  Query routing for Web search engines: architecture and experiments , 2000, Comput. Networks.

[33]  Andrei Z. Broder,et al.  A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines , 1998, Comput. Networks.

[34]  Luo Si,et al.  Using sampled data and regression to merge search engine results , 2002, SIGIR '02.

[35]  Christoph Baumgarten,et al.  A probabilistic model for distributed information retrieval , 1997, Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

[36]  Jan O. Pedersen,et al.  Phrase recognition and expansion for short, precision-biased queries based on a query log , 1999, SIGIR '99.

[37]  B. Huberman,et al.  The Deep Web : Surfacing Hidden Value , 2000 .