WIDIT: Fusion-Based Approach to Web Search Optimization

To facilitate both the understanding and the discovery of information, we need to utilize multiple sources of evidence, integrate a variety of methodologies, and combine human capabilities with those of the machine. The Web Information Discovery Integrated Tool (WIDIT) Laboratory at the School of Library and Information Science, Indiana University-Bloomington, houses several projects that employ this idea of multi-level fusion in the areas of information retrieval and knowledge discovery. This paper describes a Web search optimization study by the TREC research group of WIDIT, who explores a fusion-based approach to enhancing retrieval performance on the Web. In the study, we employed both static and dynamic tuning methods to optimize the fusion formula that combines multiple sources of evidence. By static tuning, we refer to the typical stepwise tuning of system parameters based on training data. “Dynamic tuning”, the key idea of which is to combine the human intelligence, especially pattern recognition ability, with the computational power of the machine, involves an interactive system tuning process that facilitates fine-tuning of the system parameters based on the cognitive analysis of immediate system feedback. The rest of the paper is organized as follows. The next section discusses related work in Web information retrieval (IR). Section 3 details the WIDIT approach to Web IR, followed by the description of our experiment using the TREC .gov data in section 4 and the discussion of results in section 5.

[1]  Garrison W. Cottrell,et al.  Automatic combination of multiple ranked retrieval systems , 1994, SIGIR '94.

[2]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[3]  Yiqun Liu,et al.  THU TREC2002 Web Track Experiments , 2002 .

[4]  Peter Bailey,et al.  Overview of the TREC-8 Web Track , 2000, TREC.

[5]  Amit Singhal,et al.  A case study in web search using TREC algorithms , 2001, WWW '01.

[6]  Craig Silverstein,et al.  Analysis of a Very Large Altavista Query Log" SRC Technical note #1998-14 , 1998 .

[7]  Kiduk Yang Combining Text- and Link-based Retrieval Methods for Web IR , 2001, TREC.

[8]  Christoph Hölscher,et al.  Web search behavior of Internet experts and newbies , 2000, Comput. Networks.

[9]  Stephen E. Robertson,et al.  Effective site finding using link anchor information , 2001, SIGIR '01.

[10]  David Carmel,et al.  Topic Distillation with Knowledge Agents , 2002, TREC.

[11]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[12]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[13]  Amit Singhal,et al.  Pivoted document length normalization , 1996, SIGIR 1996.

[14]  David Hawking,et al.  Overview of the TREC-2002 Web Track , 2002, TREC.

[15]  Djoerd Hiemstra,et al.  The Importance of Prior Probabilities for Entry Page Search , 2002, SIGIR '02.

[16]  Donna K. Harman,et al.  Results and Challenges in Web Search Evaluation , 1999, Comput. Networks.

[17]  David Hawking,et al.  Overview of the TREC-2001 Web track , 2002 .

[18]  Jacques Savoy,et al.  Report on the TREC-8 Experiment: Searching on the Web and in Distributed Collections , 1999, TREC.

[19]  Chris Buckley,et al.  Using Query Zoning and Correlation Within SMART: TREC 5 , 1996, TREC.

[20]  Stephen Tomlinson Robust, Web and Genomic Retrieval with Hummingbird SearchServer at TREC 2003 , 2003, TREC.

[21]  Jacques Savoy,et al.  Report on the TREC-9 Experiment: Link-based Retrieval and Distributed Collections , 2000, TREC.

[22]  Andrew MacFarlane,et al.  Pliers at Trec 2002 , 2002, TREC.

[23]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[24]  Paul Thompson,et al.  A combination of expert opinion approach to probabilistic information retrieval, part 1: The conceptual model , 1990, Inf. Process. Manag..

[25]  Alan F. Smeaton,et al.  Dublin City University Experiments in Connectivity Analysis for TREC-9 , 2000, TREC.

[26]  Ellen M. Voorhees,et al.  The Eighth Text REtrieval Conference (TREC-8) , 2000 .

[27]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.