Global vs. localized search: A comparison of database selection methods in a hierarchical environment

In this work, we compare standard global IR searching with more localized techniques to address the database selection problem. We conduct a series of experiments to compare the retrieval effectiveness of three separate search modes using a hierarchically structured data environment of textual databse representations. The data environment is represented as a tree-like structure containing over 15,000 unique databases and approximately 100,000 total leaf nodes. The search modes consist of varying degrees of browse and search, from a global search at the root node to a refined search at a sub-node using dynamically-calculated inverse document frequencies (idfs) to score the candidate databases for probable relevance. Our findings indicate that a browse plus search approach that relies upon localized searching from sub-nodes in this environment produces the most effective results.

[1]  Nicholas J. Belkin,et al.  Helping people find what they don't know , 2000, CACM.

[2]  Raya Fidel,et al.  The Role of Subject Access in Information Filtering , 1998 .

[3]  Marti A. Hearst Using Categories to Provide Context for Full-Text Retrieval Results , 1994, RIAO.

[4]  Anil S. Chakravarthy,et al.  NetSerf: using semantic knowledge to find Internet information archives , 1995, SIGIR '95.

[5]  Paul Thompson,et al.  TREC-3 Ad Hoc Retrieval and Routing Experiments using the WIN System , 1994, TREC.

[6]  Howard R. Turtle Natural language vs. Boolean query evaluation: a comparison of retrieval performance , 1994, SIGIR '94.

[7]  Ross Wilkinson,et al.  The role of a judge in a user based retrieval experiment (poster session) , 2000, Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

[8]  Divyakant Agrawal,et al.  Pharos: a scalable distributed architecture for locating heterogeneous information sources , 1997, CIKM '97.

[9]  Klaus U. Schulz,et al.  Complete answer aggregates for treelike databases: a novel approach to combine querying and navigation , 2001, TOIS.

[10]  John Vergo,et al.  A user-centered design approach to personalization , 2000, CACM.

[11]  James C. French,et al.  Comparing the performance of database selection algorithms , 1999, SIGIR '99.

[12]  Nancy A. Van House,et al.  User-Centered Iterative Design for Digital Libraries: The Cypress Experience , 1996, D Lib Mag..

[13]  Dik Lun Lee,et al.  Server Ranking for Distributed Text Retrieval Systems on the Internet , 1997, DASFAA.

[14]  James Allan,et al.  INQUERY Does Battle With TREC-6 , 1997, TREC.

[15]  Soyeon Park Usability, user preferences, effectiveness, and user behaviors when searching individual and integrated full-text databases: implications for digital libraries , 2000 .

[16]  W. Bruce Croft,et al.  INQUERY System Overview , 1993, TIPSTER.

[17]  Sandra Payette,et al.  Z39.50: The User's Perspective , 1997, D Lib Mag..

[18]  Luis Gravano,et al.  The Effectiveness of GlOSS for the Text Database Discovery Problem , 1994, SIGMOD Conference.

[19]  Ann Peterson Bishop Working Towards an Understanding of Digital Library Use: A Report on the User Research Efforts of the NSF/ARPA/NASA DLI Projects , 1995, D Lib Mag..